You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Andrew Purtell <ap...@apache.org> on 2017/10/10 18:10:55 UTC

[DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

The coprocessor API provides an environment method, bypass(), that when
called from a preXXX hook will cause the core code to skip all remaining
processing. This capability was introduced on HBASE-3348. Since this time I
think we are more enlightened about the complications of this feature. (Or,
anyway, speaking for myself:)

Not all hooks provide the bypass semantic. Where this is the case the
javadoc for the hook says so, but it can be missed. If you call bypass() in
a hook where it is not supported it is a no-op. This can lead to a poor
developer experience.

Where bypass is supported what is being bypassed is all of the core code
implementing the remainder of the operation. In order to understand what
calling bypass() will skip, a coprocessor implementer should read and
understand all of the remaining code and its nuances. Although I think this
is good practice for coprocessor developers in general, it demands a lot. I
think it would provide a much better developer experience if we didn't
allow bypass, even though it means - in theory - a coprocessor would be a
lot more limited in some ways than before. What is skipped is extremely
version dependent. That core code will vary, perhaps significantly, even
between point releases. We do not provide the promise of consistent
behavior even between point releases for the bypass semantic. To achieve
that we could not change any code between hook points. Therefore the
coprocessor implementer becomes an HBase core developer in practice as soon
as they rely on bypass(). Every release of HBase may break the assumption
that the replacement for the bypassed code takes care of all necessary
skipped concerns. Because those concerns can change at any point, such an
assumption is never safe.

I say "in theory" because I would be surprised if anyone is relying on the
bypass for the above reason. I seem to recall that Phoenix might use it in
one place to promote a normal mutation into an atomic operation, by
substituting one for the other, but if so that objective could be
reimplemented using their new locking manager.

-- 
Best regards,
Andrew

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

On Wed, Oct 11, 2017 at 11:57 AM, Andrew Purtell <ap...@apache.org>
wrote:

> > What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
> in a few select methods returning a boolean on whether bypass? Would that
> work? (Would have to figure metrics still).
>
> That would work.
>
>
Let me take a look at what'd be involved then.

Thanks,

S




>
> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>
> > The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
> > user does a Get returning instead the result of its own (Flow) Scan
> result.
> > Not sure how we'd do alternative here; Timeline Server is keeping Tags
> > internally.
> >
> >
> > On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > Rather than continue to support a weird bypass() which works in some
> > places
> > > and not in others, perhaps we can substitute it with an exception? So
> if
> > > the coprocessor throws this exception in the pre hook then where it is
> > > allowed we catch it and do the right thing, and where it is not allowed
> > we
> > > don't catch it and the server aborts. This will at least improve the
> > silent
> > > bypass() failure problem. I also don't like, in retrospect, that
> calling
> > > this environment method has magic side effects. Everyone understands
> how
> > > exceptions work, so it will be clearer.
> > >
> > >
> > We could do that though throw and catch of exceptions would be costly.
> >
> > What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
> > in a few select methods returning a boolean on whether bypass? Would that
> > work? (Would have to figure metrics still).
> >
> >
> >
> > > In any case we should try to address the Tephra and Phoenix cases
> brought
> > > up in this discussion. They look like we can find alternatives. Shall I
> > > file JIRAs to follow up?
> > >
> > >
> > >
> > On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
> > its long encoding writing Increments. Not sure how we'd do that,
> > selectively.
> >
> > St.Ack
> >
> >
> >
> > > On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > >
> > > > These examples are great.
> > > >
> > > > And I think for normal region operations such as get, put, delete,
> > > > checkAndXXX, increment, it is OK to bypass the real operation after
> > > preXXX
> > > > as the semantic is clear enough. Instead of calling env.bypass, maybe
> > > just
> > > > let these preXXX methods return a boolean is enough to tell the HBase
> > > > framework that we have already done the real operation so just give
> up
> > > and
> > > > return?
> > > >
> > > > Thanks.
> > > >
> > > > 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> > > >
> > > > > The Tephra TransactionProcessor CP makes use of bypass() in
> > preDelete()
> > > > to
> > > > > override handling of delete tombstones in a transactional way:
> > > > > https://github.com/apache/incubator-tephra/blob/master/
> > > > > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > > > hbase/coprocessor/
> > > > > TransactionProcessor.java#L244
> > > > >
> > > > > The CDAP IncrementHandler CP also makes use of bypass() in
> preGetOp()
> > > and
> > > > > preIncrementAfterRRowLock() to provide a transaction implementation
> > of
> > > > > readless increments:
> > > > > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > > > > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > > > > hbase11/IncrementHandler.java#L121
> > > > >
> > > > > What would be the alternate approach for these applications?  In
> both
> > > > cases
> > > > > they need to impose their own semantics on the underlying KeyValue
> > > > > storage.  Is there a different way this can be done?
> > > > >
> > > > >
> > > > > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Wrap core scanners is different right?  That can be done in post
> > > > > > hooks.  I have seen many use cases for this..  Its the question
> abt
> > > > > > the pre hooks where we have not yet created the core object (like
> > > > > > scanner).  The CP pre code itself doing the work of object
> creation
> > > > > > and so the core code is been bypassed.    Well the wrapping thing
> > can
> > > > > > be done in pre hook also. First create the core object by CP code
> > > > > > itself and then do the wrapped object and return.. I have seen in
> > one
> > > > > > jira issue where the usage was this way..   The wrapping can be
> > done
> > > > > > in post also in such cases I believe.
> > > > > >
> > > > > > -Anoop-
> > > > > >
> > > > > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> > > apurtell@apache.org>
> > > > > > wrote:
> > > > > > > I think we should continue to support overriding function by
> > object
> > > > > > > inheritance. I didn't mention this and am not proposing more
> than
> > > > > > removing
> > > > > > > the bypass() sematic. No more no less. Phoenix absolutely
> depends
> > > on
> > > > > > being
> > > > > > > able to wrap core scanners and return the wrappers.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> > > anoop.hbase@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> When we say bypass the core code, it can be done today not
> only
> > by
> > > > > > >> calling bypass but by returning a not null object for some of
> > the
> > > > pre
> > > > > > >> hooks.  Like preScannerOpen() if it return a scanner object,
> we
> > > will
> > > > > > >> avoid the remaining core code execution for creation of the
> > > > > > >> scanner(s).  So this proposal include this aspect also and
> > remove
> > > > any
> > > > > > >> possible way of bypassing the core code by the CP hook code
> > > > execution
> > > > > > >> ?   Am +1.
> > > > > > >>
> > > > > > >> -Anoop-
> > > > > > >>
> > > > > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > > > apurtell@apache.org
> > > > > >
> > > > > > >> wrote:
> > > > > > >> > The coprocessor API provides an environment method,
> bypass(),
> > > that
> > > > > > when
> > > > > > >> > called from a preXXX hook will cause the core code to skip
> all
> > > > > > remaining
> > > > > > >> > processing. This capability was introduced on HBASE-3348.
> > Since
> > > > this
> > > > > > >> time I
> > > > > > >> > think we are more enlightened about the complications of
> this
> > > > > feature.
> > > > > > >> (Or,
> > > > > > >> > anyway, speaking for myself:)
> > > > > > >> >
> > > > > > >> > Not all hooks provide the bypass semantic. Where this is the
> > > case
> > > > > the
> > > > > > >> > javadoc for the hook says so, but it can be missed. If you
> > call
> > > > > > bypass()
> > > > > > >> in
> > > > > > >> > a hook where it is not supported it is a no-op. This can
> lead
> > > to a
> > > > > > poor
> > > > > > >> > developer experience.
> > > > > > >> >
> > > > > > >> > Where bypass is supported what is being bypassed is all of
> the
> > > > core
> > > > > > code
> > > > > > >> > implementing the remainder of the operation. In order to
> > > > understand
> > > > > > what
> > > > > > >> > calling bypass() will skip, a coprocessor implementer should
> > > read
> > > > > and
> > > > > > >> > understand all of the remaining code and its nuances.
> > Although I
> > > > > think
> > > > > > >> this
> > > > > > >> > is good practice for coprocessor developers in general, it
> > > > demands a
> > > > > > >> lot. I
> > > > > > >> > think it would provide a much better developer experience if
> > we
> > > > > didn't
> > > > > > >> > allow bypass, even though it means - in theory - a
> coprocessor
> > > > would
> > > > > > be a
> > > > > > >> > lot more limited in some ways than before. What is skipped
> is
> > > > > > extremely
> > > > > > >> > version dependent. That core code will vary, perhaps
> > > > significantly,
> > > > > > even
> > > > > > >> > between point releases. We do not provide the promise of
> > > > consistent
> > > > > > >> > behavior even between point releases for the bypass
> semantic.
> > To
> > > > > > achieve
> > > > > > >> > that we could not change any code between hook points.
> > Therefore
> > > > the
> > > > > > >> > coprocessor implementer becomes an HBase core developer in
> > > > practice
> > > > > as
> > > > > > >> soon
> > > > > > >> > as they rely on bypass(). Every release of HBase may break
> the
> > > > > > assumption
> > > > > > >> > that the replacement for the bypassed code takes care of all
> > > > > necessary
> > > > > > >> > skipped concerns. Because those concerns can change at any
> > > point,
> > > > > > such an
> > > > > > >> > assumption is never safe.
> > > > > > >> >
> > > > > > >> > I say "in theory" because I would be surprised if anyone is
> > > > relying
> > > > > on
> > > > > > >> the
> > > > > > >> > bypass for the above reason. I seem to recall that Phoenix
> > might
> > > > use
> > > > > > it
> > > > > > >> in
> > > > > > >> > one place to promote a normal mutation into an atomic
> > operation,
> > > > by
> > > > > > >> > substituting one for the other, but if so that objective
> could
> > > be
> > > > > > >> > reimplemented using their new locking manager.
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > Best regards,
> > > > > > >> > Andrew
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Andrew
> > > > > > >
> > > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > > truth's
> > > > > > > decrepit hands
> > > > > > >    - A23, Crosstalk
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> > >
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

> What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
in a few select methods returning a boolean on whether bypass? Would that
work? (Would have to figure metrics still).

That would work.


On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:

> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
> user does a Get returning instead the result of its own (Flow) Scan result.
> Not sure how we'd do alternative here; Timeline Server is keeping Tags
> internally.
>
>
> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Rather than continue to support a weird bypass() which works in some
> places
> > and not in others, perhaps we can substitute it with an exception? So if
> > the coprocessor throws this exception in the pre hook then where it is
> > allowed we catch it and do the right thing, and where it is not allowed
> we
> > don't catch it and the server aborts. This will at least improve the
> silent
> > bypass() failure problem. I also don't like, in retrospect, that calling
> > this environment method has magic side effects. Everyone understands how
> > exceptions work, so it will be clearer.
> >
> >
> We could do that though throw and catch of exceptions would be costly.
>
> What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
> in a few select methods returning a boolean on whether bypass? Would that
> work? (Would have to figure metrics still).
>
>
>
> > In any case we should try to address the Tephra and Phoenix cases brought
> > up in this discussion. They look like we can find alternatives. Shall I
> > file JIRAs to follow up?
> >
> >
> >
> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
> its long encoding writing Increments. Not sure how we'd do that,
> selectively.
>
> St.Ack
>
>
>
> > On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > These examples are great.
> > >
> > > And I think for normal region operations such as get, put, delete,
> > > checkAndXXX, increment, it is OK to bypass the real operation after
> > preXXX
> > > as the semantic is clear enough. Instead of calling env.bypass, maybe
> > just
> > > let these preXXX methods return a boolean is enough to tell the HBase
> > > framework that we have already done the real operation so just give up
> > and
> > > return?
> > >
> > > Thanks.
> > >
> > > 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> > >
> > > > The Tephra TransactionProcessor CP makes use of bypass() in
> preDelete()
> > > to
> > > > override handling of delete tombstones in a transactional way:
> > > > https://github.com/apache/incubator-tephra/blob/master/
> > > > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > > hbase/coprocessor/
> > > > TransactionProcessor.java#L244
> > > >
> > > > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp()
> > and
> > > > preIncrementAfterRRowLock() to provide a transaction implementation
> of
> > > > readless increments:
> > > > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > > > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > > > hbase11/IncrementHandler.java#L121
> > > >
> > > > What would be the alternate approach for these applications?  In both
> > > cases
> > > > they need to impose their own semantics on the underlying KeyValue
> > > > storage.  Is there a different way this can be done?
> > > >
> > > >
> > > > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> > > wrote:
> > > >
> > > > > Wrap core scanners is different right?  That can be done in post
> > > > > hooks.  I have seen many use cases for this..  Its the question abt
> > > > > the pre hooks where we have not yet created the core object (like
> > > > > scanner).  The CP pre code itself doing the work of object creation
> > > > > and so the core code is been bypassed.    Well the wrapping thing
> can
> > > > > be done in pre hook also. First create the core object by CP code
> > > > > itself and then do the wrapped object and return.. I have seen in
> one
> > > > > jira issue where the usage was this way..   The wrapping can be
> done
> > > > > in post also in such cases I believe.
> > > > >
> > > > > -Anoop-
> > > > >
> > > > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> > apurtell@apache.org>
> > > > > wrote:
> > > > > > I think we should continue to support overriding function by
> object
> > > > > > inheritance. I didn't mention this and am not proposing more than
> > > > > removing
> > > > > > the bypass() sematic. No more no less. Phoenix absolutely depends
> > on
> > > > > being
> > > > > > able to wrap core scanners and return the wrappers.
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> > anoop.hbase@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> When we say bypass the core code, it can be done today not only
> by
> > > > > >> calling bypass but by returning a not null object for some of
> the
> > > pre
> > > > > >> hooks.  Like preScannerOpen() if it return a scanner object, we
> > will
> > > > > >> avoid the remaining core code execution for creation of the
> > > > > >> scanner(s).  So this proposal include this aspect also and
> remove
> > > any
> > > > > >> possible way of bypassing the core code by the CP hook code
> > > execution
> > > > > >> ?   Am +1.
> > > > > >>
> > > > > >> -Anoop-
> > > > > >>
> > > > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > > apurtell@apache.org
> > > > >
> > > > > >> wrote:
> > > > > >> > The coprocessor API provides an environment method, bypass(),
> > that
> > > > > when
> > > > > >> > called from a preXXX hook will cause the core code to skip all
> > > > > remaining
> > > > > >> > processing. This capability was introduced on HBASE-3348.
> Since
> > > this
> > > > > >> time I
> > > > > >> > think we are more enlightened about the complications of this
> > > > feature.
> > > > > >> (Or,
> > > > > >> > anyway, speaking for myself:)
> > > > > >> >
> > > > > >> > Not all hooks provide the bypass semantic. Where this is the
> > case
> > > > the
> > > > > >> > javadoc for the hook says so, but it can be missed. If you
> call
> > > > > bypass()
> > > > > >> in
> > > > > >> > a hook where it is not supported it is a no-op. This can lead
> > to a
> > > > > poor
> > > > > >> > developer experience.
> > > > > >> >
> > > > > >> > Where bypass is supported what is being bypassed is all of the
> > > core
> > > > > code
> > > > > >> > implementing the remainder of the operation. In order to
> > > understand
> > > > > what
> > > > > >> > calling bypass() will skip, a coprocessor implementer should
> > read
> > > > and
> > > > > >> > understand all of the remaining code and its nuances.
> Although I
> > > > think
> > > > > >> this
> > > > > >> > is good practice for coprocessor developers in general, it
> > > demands a
> > > > > >> lot. I
> > > > > >> > think it would provide a much better developer experience if
> we
> > > > didn't
> > > > > >> > allow bypass, even though it means - in theory - a coprocessor
> > > would
> > > > > be a
> > > > > >> > lot more limited in some ways than before. What is skipped is
> > > > > extremely
> > > > > >> > version dependent. That core code will vary, perhaps
> > > significantly,
> > > > > even
> > > > > >> > between point releases. We do not provide the promise of
> > > consistent
> > > > > >> > behavior even between point releases for the bypass semantic.
> To
> > > > > achieve
> > > > > >> > that we could not change any code between hook points.
> Therefore
> > > the
> > > > > >> > coprocessor implementer becomes an HBase core developer in
> > > practice
> > > > as
> > > > > >> soon
> > > > > >> > as they rely on bypass(). Every release of HBase may break the
> > > > > assumption
> > > > > >> > that the replacement for the bypassed code takes care of all
> > > > necessary
> > > > > >> > skipped concerns. Because those concerns can change at any
> > point,
> > > > > such an
> > > > > >> > assumption is never safe.
> > > > > >> >
> > > > > >> > I say "in theory" because I would be surprised if anyone is
> > > relying
> > > > on
> > > > > >> the
> > > > > >> > bypass for the above reason. I seem to recall that Phoenix
> might
> > > use
> > > > > it
> > > > > >> in
> > > > > >> > one place to promote a normal mutation into an atomic
> operation,
> > > by
> > > > > >> > substituting one for the other, but if so that objective could
> > be
> > > > > >> > reimplemented using their new locking manager.
> > > > > >> >
> > > > > >> > --
> > > > > >> > Best regards,
> > > > > >> > Andrew
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's
> > > > > > decrepit hands
> > > > > >    - A23, Crosstalk
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Anoop John <an...@gmail.com>.

+1 for not allowing flush/compaction CPs to do bypass.  So which all
pre hooks to support bypass now?  Some mutations related only?  If u
have a list pls post.  Better to add in HBASE-18770 jira

-Anoop-

On Wed, Oct 25, 2017 at 10:23 AM, Stack <st...@duboce.net> wrote:
> I made a start on HBASE-18770. It has edit of RegionObserver which denotes
> methods that support bypass (Unfortunately, because of the varied
> signatures, how bypass is signaled varies too). Would appreciate a
> once-over.
>
> of note, a CP cannot bypass flush. Speak up if you think otherwise (or you
> can think of a case where this needed). My rationale is CPs won't have
> enough insider knowledge to do memory accounting in a world of in-memory
> compactions, and on/offheap memory in our hosting process. What ye reckon?
>
> Coprocessors have always been able to adjust what gets compacted in any run
> and even skirt compaction altogether by returning an empty set of files to
> compact. This works as it ever did.
>
> Thanks,
> S
>
>
>
> On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
>
>> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
>> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
>> was going bad for a good while but lots of contributors and good discussion
>> and now I think we have it). Shouldn't be too much longer.
>>
>> Its CP API so I was figuring it an alpha-4 item.
>>
>> St.Ack
>>
>> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>>
>>> May still a week or two before alpha4 I think. The scan injection, and
>>> flush/compaction trigger/track API is still unstable...
>>>
>>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>>>
>>> > (catching up here)
>>> >
>>> > I'm glad to see you fine folks came to a conclusion around a
>>> reduced-scope
>>> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>>> for
>>> > preXXX methods, and we'd remove it for the other methods? What exactly
>>> the
>>> > "bypass API" would be is up in the air, correct?
>>> >
>>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> > discussion appears to have died down?
>>> >
>>> > I was originally lamenting yet another big, sweeping change to CPs when
>>> I
>>> > had expected alpha-4 to have already landed. But, let me play devil's
>>> > advocate: is this something we still think is critical to do in
>>> alpha-4? I
>>> > can respect wanting to get address all of these smells, but I'd be
>>> worry it
>>> > delays us further.
>>> >
>>> >
>>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>> >
>>> >> Creating an exception is expensive so if it is not suggested to do it
>>> in a
>>> >> normal case. A common trick is to create a global exception instance,
>>> and
>>> >> always throw it to avoid creating every time but I think it is more
>>> >> friendly to just use a return value?
>>> >>
>>> >> And for me, the bypass after preXXX for normal region operations just
>>> >> equals to a 'cancel', which is very clear and easy to understand, so I
>>> >> think it is OK to add bypass support for them. And also for compaction
>>> and
>>> >> flush, it is OK to give CP users the ability to cancel the operation as
>>> >> the
>>> >> semantic is clear, although I'm not sure how CP users would use this
>>> >> feature.
>>> >>
>>> >> In general, I think we can provide bypass/cancel support in preXXX
>>> methods
>>> >> where it is the very beginning of an operation.
>>> >>
>>> >> Thanks.
>>> >>
>>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>> >>
>>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>> use
>>> >>>>
>>> >>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>> selectively.
>>> >>>
>>> >>> If we can handle the rest of the trouble that you observed:
>>> >>>
>>> >>> 1) Lack of recognition and identification of when the key value to
>>> >>> increment doesn't exist
>>> >>> 2) Lack of the ability to set the timestamp of the updated key value.
>>> >>>
>>> >>> then they might be able to make it work. Perhaps a conversion from
>>> HBase
>>> >>> native to Phoenix LONG encoding when processing results, in the
>>> wrapping
>>> >>> scanner, informed by schema metadata.
>>> >>>
>>> >>> Or if we are keeping the bypass semantic in select places but
>>> >>> implementing
>>> >>> it with something other than today's bypass() API (please) this would
>>> be
>>> >>> another candidate for where to keep it. Duo suggests keeping the
>>> semantic
>>> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> >>> redo
>>> >>> those APIs to skip normal processing based on a return value or
>>> exception
>>> >>> but otherwise drop bypass from all the others. It will clean up areas
>>> of
>>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>>> this
>>> >>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>> >>> would be: only RPC hooks will early out, and only if you return this
>>> >>> value,
>>> >>> or throw that exception.
>>> >>>
>>> >>>
>>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>> >>>
>>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>>> when
>>> >>>> user does a Get returning instead the result of its own (Flow) Scan
>>> >>>>
>>> >>> result.
>>> >>>
>>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping
>>> Tags
>>> >>>> internally.
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
>>> apurtell@apache.org>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Rather than continue to support a weird bypass() which works in some
>>> >>>>>
>>> >>>> places
>>> >>>>
>>> >>>>> and not in others, perhaps we can substitute it with an exception?
>>> So
>>> >>>>>
>>> >>>> if
>>> >>>
>>> >>>> the coprocessor throws this exception in the pre hook then where it
>>> is
>>> >>>>> allowed we catch it and do the right thing, and where it is not
>>> allowed
>>> >>>>>
>>> >>>> we
>>> >>>>
>>> >>>>> don't catch it and the server aborts. This will at least improve the
>>> >>>>>
>>> >>>> silent
>>> >>>>
>>> >>>>> bypass() failure problem. I also don't like, in retrospect, that
>>> >>>>>
>>> >>>> calling
>>> >>>
>>> >>>> this environment method has magic side effects. Everyone understands
>>> >>>>>
>>> >>>> how
>>> >>>
>>> >>>> exceptions work, so it will be clearer.
>>> >>>>>
>>> >>>>>
>>> >>>>> We could do that though throw and catch of exceptions would be
>>> costly.
>>> >>>>
>>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>> >>>> preXXX
>>> >>>> in a few select methods returning a boolean on whether bypass? Would
>>> >>>> that
>>> >>>> work? (Would have to figure metrics still).
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> In any case we should try to address the Tephra and Phoenix cases
>>> >>>>>
>>> >>>> brought
>>> >>>
>>> >>>> up in this discussion. They look like we can find alternatives.
>>> Shall I
>>> >>>>> file JIRAs to follow up?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
>>> to
>>> >>>> use
>>> >>>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>>> selectively.
>>> >>>>
>>> >>>> St.Ack
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
>>> palomino219@gmail.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> These examples are great.
>>> >>>>>>
>>> >>>>>> And I think for normal region operations such as get, put, delete,
>>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>> >>>>>>
>>> >>>>> preXXX
>>> >>>>>
>>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>>> maybe
>>> >>>>>>
>>> >>>>> just
>>> >>>>>
>>> >>>>>> let these preXXX methods return a boolean is enough to tell the
>>> HBase
>>> >>>>>> framework that we have already done the real operation so just give
>>> >>>>>>
>>> >>>>> up
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> return?
>>> >>>>>>
>>> >>>>>> Thanks.
>>> >>>>>>
>>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>> >>>>>>
>>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preDelete()
>>> >>>>
>>> >>>>> to
>>> >>>>>>
>>> >>>>>>> override handling of delete tombstones in a transactional way:
>>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>> >>>>>>>
>>> >>>>>> hbase/coprocessor/
>>> >>>>>>
>>> >>>>>>> TransactionProcessor.java#L244
>>> >>>>>>>
>>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preGetOp()
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>> >>>>>>>
>>> >>>>>> of
>>> >>>>
>>> >>>>> readless increments:
>>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>> >>>>>>> hbase11/IncrementHandler.java#L121
>>> >>>>>>>
>>> >>>>>>> What would be the alternate approach for these applications?  In
>>> >>>>>>>
>>> >>>>>> both
>>> >>>
>>> >>>> cases
>>> >>>>>>
>>> >>>>>>> they need to impose their own semantics on the underlying KeyValue
>>> >>>>>>> storage.  Is there a different way this can be done?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
>>> anoop.hbase@gmail.com
>>> >>>>>>>
>>> >>>>>>
>>> >>>> wrote:
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Wrap core scanners is different right?  That can be done in post
>>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>> >>>>>>>>
>>> >>>>>>> abt
>>> >>>
>>> >>>> the pre hooks where we have not yet created the core object (like
>>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
>>> >>>>>>>>
>>> >>>>>>> creation
>>> >>>
>>> >>>> and so the core code is been bypassed.    Well the wrapping thing
>>> >>>>>>>>
>>> >>>>>>> can
>>> >>>>
>>> >>>>> be done in pre hook also. First create the core object by CP code
>>> >>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>> >>>>>>>>
>>> >>>>>>> one
>>> >>>>
>>> >>>>> jira issue where the usage was this way..   The wrapping can be
>>> >>>>>>>>
>>> >>>>>>> done
>>> >>>>
>>> >>>>> in post also in such cases I believe.
>>> >>>>>>>>
>>> >>>>>>>> -Anoop-
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>> >>>>>>>>
>>> >>>>>>> apurtell@apache.org>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I think we should continue to support overriding function by
>>> >>>>>>>>>
>>> >>>>>>>> object
>>> >>>>
>>> >>>>> inheritance. I didn't mention this and am not proposing more
>>> >>>>>>>>>
>>> >>>>>>>> than
>>> >>>
>>> >>>> removing
>>> >>>>>>>>
>>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>> >>>>>>>>>
>>> >>>>>>>> depends
>>> >>>
>>> >>>> on
>>> >>>>>
>>> >>>>>> being
>>> >>>>>>>>
>>> >>>>>>>>> able to wrap core scanners and return the wrappers.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>> >>>>>>>>>
>>> >>>>>>>> anoop.hbase@gmail.com>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> When we say bypass the core code, it can be done today not
>>> >>>>>>>>>>
>>> >>>>>>>>> only
>>> >>>
>>> >>>> by
>>> >>>>
>>> >>>>> calling bypass but by returning a not null object for some of
>>> >>>>>>>>>>
>>> >>>>>>>>> the
>>> >>>>
>>> >>>>> pre
>>> >>>>>>
>>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>> >>>>>>>>>>
>>> >>>>>>>>> we
>>> >>>
>>> >>>> will
>>> >>>>>
>>> >>>>>> avoid the remaining core code execution for creation of the
>>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>> >>>>>>>>>>
>>> >>>>>>>>> remove
>>> >>>>
>>> >>>>> any
>>> >>>>>>
>>> >>>>>>> possible way of bypassing the core code by the CP hook code
>>> >>>>>>>>>>
>>> >>>>>>>>> execution
>>> >>>>>>
>>> >>>>>>> ?   Am +1.
>>> >>>>>>>>>>
>>> >>>>>>>>>> -Anoop-
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>> >>>>>>>>>>
>>> >>>>>>>>> apurtell@apache.org
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> The coprocessor API provides an environment method,
>>> >>>>>>>>>>>
>>> >>>>>>>>>> bypass(),
>>> >>>
>>> >>>> that
>>> >>>>>
>>> >>>>>> when
>>> >>>>>>>>
>>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
>>> >>>>>>>>>>>
>>> >>>>>>>>>> all
>>> >>>
>>> >>>> remaining
>>> >>>>>>>>
>>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Since
>>> >>>>
>>> >>>>> this
>>> >>>>>>
>>> >>>>>>> time I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think we are more enlightened about the complications of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> this
>>> >>>
>>> >>>> feature.
>>> >>>>>>>
>>> >>>>>>>> (Or,
>>> >>>>>>>>>>
>>> >>>>>>>>>>> anyway, speaking for myself:)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>> >>>>>>>>>>>
>>> >>>>>>>>>> case
>>> >>>>>
>>> >>>>>> the
>>> >>>>>>>
>>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>> >>>>>>>>>>>
>>> >>>>>>>>>> call
>>> >>>>
>>> >>>>> bypass()
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>> >>>>>>>>>>>
>>> >>>>>>>>>> lead
>>> >>>
>>> >>>> to a
>>> >>>>>
>>> >>>>>> poor
>>> >>>>>>>>
>>> >>>>>>>>> developer experience.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> core
>>> >>>>>>
>>> >>>>>>> code
>>> >>>>>>>>
>>> >>>>>>>>> implementing the remainder of the operation. In order to
>>> >>>>>>>>>>>
>>> >>>>>>>>>> understand
>>> >>>>>>
>>> >>>>>>> what
>>> >>>>>>>>
>>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>> >>>>>>>>>>>
>>> >>>>>>>>>> read
>>> >>>>>
>>> >>>>>> and
>>> >>>>>>>
>>> >>>>>>>> understand all of the remaining code and its nuances.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Although I
>>> >>>>
>>> >>>>> think
>>> >>>>>>>
>>> >>>>>>>> this
>>> >>>>>>>>>>
>>> >>>>>>>>>>> is good practice for coprocessor developers in general, it
>>> >>>>>>>>>>>
>>> >>>>>>>>>> demands a
>>> >>>>>>
>>> >>>>>>> lot. I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think it would provide a much better developer experience if
>>> >>>>>>>>>>>
>>> >>>>>>>>>> we
>>> >>>>
>>> >>>>> didn't
>>> >>>>>>>
>>> >>>>>>>> allow bypass, even though it means - in theory - a
>>> >>>>>>>>>>>
>>> >>>>>>>>>> coprocessor
>>> >>>
>>> >>>> would
>>> >>>>>>
>>> >>>>>>> be a
>>> >>>>>>>>
>>> >>>>>>>>> lot more limited in some ways than before. What is skipped
>>> >>>>>>>>>>>
>>> >>>>>>>>>> is
>>> >>>
>>> >>>> extremely
>>> >>>>>>>>
>>> >>>>>>>>> version dependent. That core code will vary, perhaps
>>> >>>>>>>>>>>
>>> >>>>>>>>>> significantly,
>>> >>>>>>
>>> >>>>>>> even
>>> >>>>>>>>
>>> >>>>>>>>> between point releases. We do not provide the promise of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> consistent
>>> >>>>>>
>>> >>>>>>> behavior even between point releases for the bypass
>>> >>>>>>>>>>>
>>> >>>>>>>>>> semantic.
>>> >>>
>>> >>>> To
>>> >>>>
>>> >>>>> achieve
>>> >>>>>>>>
>>> >>>>>>>>> that we could not change any code between hook points.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Therefore
>>> >>>>
>>> >>>>> the
>>> >>>>>>
>>> >>>>>>> coprocessor implementer becomes an HBase core developer in
>>> >>>>>>>>>>>
>>> >>>>>>>>>> practice
>>> >>>>>>
>>> >>>>>>> as
>>> >>>>>>>
>>> >>>>>>>> soon
>>> >>>>>>>>>>
>>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> assumption
>>> >>>>>>>>
>>> >>>>>>>>> that the replacement for the bypassed code takes care of all
>>> >>>>>>>>>>>
>>> >>>>>>>>>> necessary
>>> >>>>>>>
>>> >>>>>>>> skipped concerns. Because those concerns can change at any
>>> >>>>>>>>>>>
>>> >>>>>>>>>> point,
>>> >>>>>
>>> >>>>>> such an
>>> >>>>>>>>
>>> >>>>>>>>> assumption is never safe.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>> >>>>>>>>>>>
>>> >>>>>>>>>> relying
>>> >>>>>>
>>> >>>>>>> on
>>> >>>>>>>
>>> >>>>>>>> the
>>> >>>>>>>>>>
>>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>> >>>>>>>>>>>
>>> >>>>>>>>>> might
>>> >>>>
>>> >>>>> use
>>> >>>>>>
>>> >>>>>>> it
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
>>> >>>>>>>>>>>
>>> >>>>>>>>>> operation,
>>> >>>>
>>> >>>>> by
>>> >>>>>>
>>> >>>>>>> substituting one for the other, but if so that objective
>>> >>>>>>>>>>>
>>> >>>>>>>>>> could
>>> >>>
>>> >>>> be
>>> >>>>>
>>> >>>>>> reimplemented using their new locking manager.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>
>>> >>>
>>> >>
>>>
>>
>>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

On Mon, Oct 30, 2017 at 1:56 PM, Stack <st...@duboce.net> wrote:

> On Mon, Oct 30, 2017 at 1:45 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
>> I think complete and bypass are separate considerations and complete can
>> be
>> used universally while we've decided to make bypass work only in some
>> contexts.
>>
>> That said, we can consider removing the complete semantic. Let's pose the
>> same question we did about bypass. Does anyone use it? Can we live without
>> it? As you point out, security interrupts processing by throwing an
>> exception, which is meant to propagate all the way back to the user. It
>> simplifies the theory of operation for coprocessors if we can assume
>> either
>> the entire chain will complete or one of the coprocessors in the chain
>> will
>> throw an exception that not only terminates processing of the rest of the
>> chain but also the operation in progress.
>>
>>
>
> I'd be game for removing 'complete'.
>
> I can leave this question hang a while. Purge would be internals and
> javadoc changes so it can happen after alpha-4 (HBASE-19123 is the
> issue-to-purge).
>
>
Ok. No calls for the preservation of 'complete'.

HBASE-19123 "Purge 'complete' support from Coprocesor Observers" has a
patch up and I'll commit it in the morning unless objection. It removes the
complete facility.

Thanks,
S



> Thanks,
> S
>
>
>
>
>
>>
>> On Mon, Oct 30, 2017 at 10:46 AM, Stack <st...@duboce.net> wrote:
>>
>> > HBASE-18770 (bypass) is coming along (thanks for the helpful reviews so
>> > far!).
>> >
>> > Of note, I have changed the Coprocessor Observer 'complete' function so
>> it
>> > is only available on 'bypassable' methods. Is this ok to do (I'm no
>> expert
>> > on coprocessoring)? I do it in in the name of KISS. Having any
>> Coprocessor
>> > being able to 'complete' overriding any Coprocessor that comes behind
>> it in
>> > the processing chain seems obnoxious. I can see the need if a
>> Coprocessor
>> > is bypasable and has conjured an answer it wants to be back to the
>> client
>> > without tainting by subsequent Coprocessors -- which seems to be how it
>> is
>> > used in my survey of Coprocessor implementations -- but perhaps I am
>> > missing a use case? (AccessController throws an exception when access is
>> > denied). Downside of supporting 'complete' globally is more overrides
>> > internally and messaging gets a bit more muddled.
>> >
>> > Here is more on 'complete' in case you don't know what it is about. If a
>> > method's 'pre' hook is wrapped by 10 Coprocessor observers, each
>> observer
>> > gets called one after the other before we go ahead and do the actual
>> > method invocation. If the first Coprocessor in the chain calls
>> 'complete'
>> > in its context, we will skip calling the remaining 9 coprocessors and
>> then
>> > go ahead and make the method invocation.
>> >
>> > Any opinions out there on 'complete'? Any objections to my only allowing
>> > 'complete' on bypassable methods?
>> >
>> > Thanks,
>> > St.Ack
>> >
>> >
>> >
>> > On Tue, Oct 24, 2017 at 9:53 PM, Stack <st...@duboce.net> wrote:
>> >
>> > > I made a start on HBASE-18770. It has edit of RegionObserver which
>> > denotes
>> > > methods that support bypass (Unfortunately, because of the varied
>> > > signatures, how bypass is signaled varies too). Would appreciate a
>> > > once-over.
>> > >
>> > > of note, a CP cannot bypass flush. Speak up if you think otherwise (or
>> > you
>> > > can think of a case where this needed). My rationale is CPs won't have
>> > > enough insider knowledge to do memory accounting in a world of
>> in-memory
>> > > compactions, and on/offheap memory in our hosting process. What ye
>> > reckon?
>> > >
>> > > Coprocessors have always been able to adjust what gets compacted in
>> any
>> > > run and even skirt compaction altogether by returning an empty set of
>> > files
>> > > to compact. This works as it ever did.
>> > >
>> > > Thanks,
>> > > S
>> > >
>> > >
>> > >
>> > > On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
>> > >
>> > >> I was going to pick up on the bypass after HBASE-19007 lands,
>> cleaning
>> > up
>> > >> our exposure of Master/RegionServerServices to Coprocessors
>> (HBASE-19007
>> > >> was going bad for a good while but lots of contributors and good
>> > discussion
>> > >> and now I think we have it). Shouldn't be too much longer.
>> > >>
>> > >> Its CP API so I was figuring it an alpha-4 item.
>> > >>
>> > >> St.Ack
>> > >>
>> > >> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <
>> palomino219@gmail.com>
>> > >> wrote:
>> > >>
>> > >>> Fine. Let me change the title of HBASE-18770 and prepare a patch
>> there.
>> > >>>
>> > >>> May still a week or two before alpha4 I think. The scan injection,
>> and
>> > >>> flush/compaction trigger/track API is still unstable...
>> > >>>
>> > >>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>> > >>>
>> > >>> > (catching up here)
>> > >>> >
>> > >>> > I'm glad to see you fine folks came to a conclusion around a
>> > >>> reduced-scope
>> > >>> > solution (correct me if I'm wrong). "Some" bypass mechanism would
>> > stay
>> > >>> for
>> > >>> > preXXX methods, and we'd remove it for the other methods? What
>> > exactly
>> > >>> the
>> > >>> > "bypass API" would be is up in the air, correct?
>> > >>> >
>> > >>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
>> > >>> > discussion appears to have died down?
>> > >>> >
>> > >>> > I was originally lamenting yet another big, sweeping change to CPs
>> > >>> when I
>> > >>> > had expected alpha-4 to have already landed. But, let me play
>> devil's
>> > >>> > advocate: is this something we still think is critical to do in
>> > >>> alpha-4? I
>> > >>> > can respect wanting to get address all of these smells, but I'd be
>> > >>> worry it
>> > >>> > delays us further.
>> > >>> >
>> > >>> >
>> > >>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>> > >>> >
>> > >>> >> Creating an exception is expensive so if it is not suggested to
>> do
>> > it
>> > >>> in a
>> > >>> >> normal case. A common trick is to create a global exception
>> > instance,
>> > >>> and
>> > >>> >> always throw it to avoid creating every time but I think it is
>> more
>> > >>> >> friendly to just use a return value?
>> > >>> >>
>> > >>> >> And for me, the bypass after preXXX for normal region operations
>> > just
>> > >>> >> equals to a 'cancel', which is very clear and easy to understand,
>> > so I
>> > >>> >> think it is OK to add bypass support for them. And also for
>> > >>> compaction and
>> > >>> >> flush, it is OK to give CP users the ability to cancel the
>> operation
>> > >>> as
>> > >>> >> the
>> > >>> >> semantic is clear, although I'm not sure how CP users would use
>> this
>> > >>> >> feature.
>> > >>> >>
>> > >>> >> In general, I think we can provide bypass/cancel support in
>> preXXX
>> > >>> methods
>> > >>> >> where it is the very beginning of an operation.
>> > >>> >>
>> > >>> >> Thanks.
>> > >>> >>
>> > >>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>> > >>> >>
>> > >>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix
>> wants
>> > to
>> > >>> use
>> > >>> >>>>
>> > >>> >>> its long encoding writing Increments. Not sure how we'd do that,
>> > >>> >>> selectively.
>> > >>> >>>
>> > >>> >>> If we can handle the rest of the trouble that you observed:
>> > >>> >>>
>> > >>> >>> 1) Lack of recognition and identification of when the key value
>> to
>> > >>> >>> increment doesn't exist
>> > >>> >>> 2) Lack of the ability to set the timestamp of the updated key
>> > value.
>> > >>> >>>
>> > >>> >>> then they might be able to make it work. Perhaps a conversion
>> from
>> > >>> HBase
>> > >>> >>> native to Phoenix LONG encoding when processing results, in the
>> > >>> wrapping
>> > >>> >>> scanner, informed by schema metadata.
>> > >>> >>>
>> > >>> >>> Or if we are keeping the bypass semantic in select places but
>> > >>> >>> implementing
>> > >>> >>> it with something other than today's bypass() API (please) this
>> > >>> would be
>> > >>> >>> another candidate for where to keep it. Duo suggests keeping the
>> > >>> semantic
>> > >>> >>> in all of the basic RPC preXXX hooks for query and mutation. We
>> > could
>> > >>> >>> redo
>> > >>> >>> those APIs to skip normal processing based on a return value or
>> > >>> exception
>> > >>> >>> but otherwise drop bypass from all the others. It will clean up
>> > >>> areas of
>> > >>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what
>> > about
>> > >>> this
>> > >>> >>> arcane hook in compaction? Or [insert some deep hook here]? The
>> > >>> answer
>> > >>> >>> would be: only RPC hooks will early out, and only if you return
>> > this
>> > >>> >>> value,
>> > >>> >>> or throw that exception.
>> > >>> >>>
>> > >>> >>>
>> > >>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net>
>> wrote:
>> > >>> >>>
>> > >>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does
>> bypass
>> > >>> when
>> > >>> >>>> user does a Get returning instead the result of its own (Flow)
>> > Scan
>> > >>> >>>>
>> > >>> >>> result.
>> > >>> >>>
>> > >>> >>>> Not sure how we'd do alternative here; Timeline Server is
>> keeping
>> > >>> Tags
>> > >>> >>>> internally.
>> > >>> >>>>
>> > >>> >>>>
>> > >>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
>> > >>> apurtell@apache.org>
>> > >>> >>>> wrote:
>> > >>> >>>>
>> > >>> >>>> Rather than continue to support a weird bypass() which works in
>> > some
>> > >>> >>>>>
>> > >>> >>>> places
>> > >>> >>>>
>> > >>> >>>>> and not in others, perhaps we can substitute it with an
>> > exception?
>> > >>> So
>> > >>> >>>>>
>> > >>> >>>> if
>> > >>> >>>
>> > >>> >>>> the coprocessor throws this exception in the pre hook then
>> where
>> > it
>> > >>> is
>> > >>> >>>>> allowed we catch it and do the right thing, and where it is
>> not
>> > >>> allowed
>> > >>> >>>>>
>> > >>> >>>> we
>> > >>> >>>>
>> > >>> >>>>> don't catch it and the server aborts. This will at least
>> improve
>> > >>> the
>> > >>> >>>>>
>> > >>> >>>> silent
>> > >>> >>>>
>> > >>> >>>>> bypass() failure problem. I also don't like, in retrospect,
>> that
>> > >>> >>>>>
>> > >>> >>>> calling
>> > >>> >>>
>> > >>> >>>> this environment method has magic side effects. Everyone
>> > understands
>> > >>> >>>>>
>> > >>> >>>> how
>> > >>> >>>
>> > >>> >>>> exceptions work, so it will be clearer.
>> > >>> >>>>>
>> > >>> >>>>>
>> > >>> >>>>> We could do that though throw and catch of exceptions would be
>> > >>> costly.
>> > >>> >>>>
>> > >>> >>>> What about the Duo suggestion? Purge bypass flag and replace
>> it w/
>> > >>> >>>> preXXX
>> > >>> >>>> in a few select methods returning a boolean on whether bypass?
>> > Would
>> > >>> >>>> that
>> > >>> >>>> work? (Would have to figure metrics still).
>> > >>> >>>>
>> > >>> >>>>
>> > >>> >>>>
>> > >>> >>>> In any case we should try to address the Tephra and Phoenix
>> cases
>> > >>> >>>>>
>> > >>> >>>> brought
>> > >>> >>>
>> > >>> >>>> up in this discussion. They look like we can find alternatives.
>> > >>> Shall I
>> > >>> >>>>> file JIRAs to follow up?
>> > >>> >>>>>
>> > >>> >>>>>
>> > >>> >>>>>
>> > >>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix
>> > wants
>> > >>> to
>> > >>> >>>> use
>> > >>> >>>> its long encoding writing Increments. Not sure how we'd do
>> that,
>> > >>> >>>> selectively.
>> > >>> >>>>
>> > >>> >>>> St.Ack
>> > >>> >>>>
>> > >>> >>>>
>> > >>> >>>>
>> > >>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
>> > >>> palomino219@gmail.com>
>> > >>> >>>>> wrote:
>> > >>> >>>>>
>> > >>> >>>>> These examples are great.
>> > >>> >>>>>>
>> > >>> >>>>>> And I think for normal region operations such as get, put,
>> > delete,
>> > >>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation
>> > >>> after
>> > >>> >>>>>>
>> > >>> >>>>> preXXX
>> > >>> >>>>>
>> > >>> >>>>>> as the semantic is clear enough. Instead of calling
>> env.bypass,
>> > >>> maybe
>> > >>> >>>>>>
>> > >>> >>>>> just
>> > >>> >>>>>
>> > >>> >>>>>> let these preXXX methods return a boolean is enough to tell
>> the
>> > >>> HBase
>> > >>> >>>>>> framework that we have already done the real operation so
>> just
>> > >>> give
>> > >>> >>>>>>
>> > >>> >>>>> up
>> > >>> >>>
>> > >>> >>>> and
>> > >>> >>>>>
>> > >>> >>>>>> return?
>> > >>> >>>>>>
>> > >>> >>>>>> Thanks.
>> > >>> >>>>>>
>> > >>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <ghelmling@gmail.com
>> >:
>> > >>> >>>>>>
>> > >>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>> > >>> >>>>>>>
>> > >>> >>>>>> preDelete()
>> > >>> >>>>
>> > >>> >>>>> to
>> > >>> >>>>>>
>> > >>> >>>>>>> override handling of delete tombstones in a transactional
>> way:
>> > >>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>> > >>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>> > >>> >>>>>>>
>> > >>> >>>>>> hbase/coprocessor/
>> > >>> >>>>>>
>> > >>> >>>>>>> TransactionProcessor.java#L244
>> > >>> >>>>>>>
>> > >>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>> > >>> >>>>>>>
>> > >>> >>>>>> preGetOp()
>> > >>> >>>
>> > >>> >>>> and
>> > >>> >>>>>
>> > >>> >>>>>> preIncrementAfterRRowLock() to provide a transaction
>> > >>> implementation
>> > >>> >>>>>>>
>> > >>> >>>>>> of
>> > >>> >>>>
>> > >>> >>>>> readless increments:
>> > >>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>> > >>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>> > >>> >>>>>>> hbase11/IncrementHandler.java#L121
>> > >>> >>>>>>>
>> > >>> >>>>>>> What would be the alternate approach for these applications?
>> > In
>> > >>> >>>>>>>
>> > >>> >>>>>> both
>> > >>> >>>
>> > >>> >>>> cases
>> > >>> >>>>>>
>> > >>> >>>>>>> they need to impose their own semantics on the underlying
>> > >>> KeyValue
>> > >>> >>>>>>> storage.  Is there a different way this can be done?
>> > >>> >>>>>>>
>> > >>> >>>>>>>
>> > >>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
>> > >>> anoop.hbase@gmail.com
>> > >>> >>>>>>>
>> > >>> >>>>>>
>> > >>> >>>> wrote:
>> > >>> >>>>>>
>> > >>> >>>>>>>
>> > >>> >>>>>>> Wrap core scanners is different right?  That can be done in
>> > post
>> > >>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the
>> > question
>> > >>> >>>>>>>>
>> > >>> >>>>>>> abt
>> > >>> >>>
>> > >>> >>>> the pre hooks where we have not yet created the core object
>> (like
>> > >>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
>> > >>> >>>>>>>>
>> > >>> >>>>>>> creation
>> > >>> >>>
>> > >>> >>>> and so the core code is been bypassed.    Well the wrapping
>> thing
>> > >>> >>>>>>>>
>> > >>> >>>>>>> can
>> > >>> >>>>
>> > >>> >>>>> be done in pre hook also. First create the core object by CP
>> code
>> > >>> >>>>>>>> itself and then do the wrapped object and return.. I have
>> seen
>> > >>> in
>> > >>> >>>>>>>>
>> > >>> >>>>>>> one
>> > >>> >>>>
>> > >>> >>>>> jira issue where the usage was this way..   The wrapping can
>> be
>> > >>> >>>>>>>>
>> > >>> >>>>>>> done
>> > >>> >>>>
>> > >>> >>>>> in post also in such cases I believe.
>> > >>> >>>>>>>>
>> > >>> >>>>>>>> -Anoop-
>> > >>> >>>>>>>>
>> > >>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>> > >>> >>>>>>>>
>> > >>> >>>>>>> apurtell@apache.org>
>> > >>> >>>>>
>> > >>> >>>>>> wrote:
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> I think we should continue to support overriding function
>> by
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>> object
>> > >>> >>>>
>> > >>> >>>>> inheritance. I didn't mention this and am not proposing more
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>> than
>> > >>> >>>
>> > >>> >>>> removing
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>> depends
>> > >>> >>>
>> > >>> >>>> on
>> > >>> >>>>>
>> > >>> >>>>>> being
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> able to wrap core scanners and return the wrappers.
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>> anoop.hbase@gmail.com>
>> > >>> >>>>>
>> > >>> >>>>>> wrote:
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>>
>> > >>> >>>>>>>>> When we say bypass the core code, it can be done today not
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> only
>> > >>> >>>
>> > >>> >>>> by
>> > >>> >>>>
>> > >>> >>>>> calling bypass but by returning a not null object for some of
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> the
>> > >>> >>>>
>> > >>> >>>>> pre
>> > >>> >>>>>>
>> > >>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> we
>> > >>> >>>
>> > >>> >>>> will
>> > >>> >>>>>
>> > >>> >>>>>> avoid the remaining core code execution for creation of the
>> > >>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also
>> and
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> remove
>> > >>> >>>>
>> > >>> >>>>> any
>> > >>> >>>>>>
>> > >>> >>>>>>> possible way of bypassing the core code by the CP hook code
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> execution
>> > >>> >>>>>>
>> > >>> >>>>>>> ?   Am +1.
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>> -Anoop-
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>> apurtell@apache.org
>> > >>> >>>>>>
>> > >>> >>>>>>>
>> > >>> >>>>>>>> wrote:
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> The coprocessor API provides an environment method,
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> bypass(),
>> > >>> >>>
>> > >>> >>>> that
>> > >>> >>>>>
>> > >>> >>>>>> when
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> all
>> > >>> >>>
>> > >>> >>>> remaining
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> Since
>> > >>> >>>>
>> > >>> >>>>> this
>> > >>> >>>>>>
>> > >>> >>>>>>> time I
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> think we are more enlightened about the complications of
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> this
>> > >>> >>>
>> > >>> >>>> feature.
>> > >>> >>>>>>>
>> > >>> >>>>>>>> (Or,
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> anyway, speaking for myself:)
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is
>> > the
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> case
>> > >>> >>>>>
>> > >>> >>>>>> the
>> > >>> >>>>>>>
>> > >>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> call
>> > >>> >>>>
>> > >>> >>>>> bypass()
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> in
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> lead
>> > >>> >>>
>> > >>> >>>> to a
>> > >>> >>>>>
>> > >>> >>>>>> poor
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> developer experience.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all
>> of
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> the
>> > >>> >>>
>> > >>> >>>> core
>> > >>> >>>>>>
>> > >>> >>>>>>> code
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> implementing the remainder of the operation. In order to
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> understand
>> > >>> >>>>>>
>> > >>> >>>>>>> what
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer
>> should
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> read
>> > >>> >>>>>
>> > >>> >>>>>> and
>> > >>> >>>>>>>
>> > >>> >>>>>>>> understand all of the remaining code and its nuances.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> Although I
>> > >>> >>>>
>> > >>> >>>>> think
>> > >>> >>>>>>>
>> > >>> >>>>>>>> this
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> is good practice for coprocessor developers in general,
>> it
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> demands a
>> > >>> >>>>>>
>> > >>> >>>>>>> lot. I
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> think it would provide a much better developer
>> experience
>> > if
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> we
>> > >>> >>>>
>> > >>> >>>>> didn't
>> > >>> >>>>>>>
>> > >>> >>>>>>>> allow bypass, even though it means - in theory - a
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> coprocessor
>> > >>> >>>
>> > >>> >>>> would
>> > >>> >>>>>>
>> > >>> >>>>>>> be a
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> lot more limited in some ways than before. What is skipped
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> is
>> > >>> >>>
>> > >>> >>>> extremely
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> version dependent. That core code will vary, perhaps
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> significantly,
>> > >>> >>>>>>
>> > >>> >>>>>>> even
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> between point releases. We do not provide the promise of
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> consistent
>> > >>> >>>>>>
>> > >>> >>>>>>> behavior even between point releases for the bypass
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> semantic.
>> > >>> >>>
>> > >>> >>>> To
>> > >>> >>>>
>> > >>> >>>>> achieve
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> that we could not change any code between hook points.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> Therefore
>> > >>> >>>>
>> > >>> >>>>> the
>> > >>> >>>>>>
>> > >>> >>>>>>> coprocessor implementer becomes an HBase core developer in
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> practice
>> > >>> >>>>>>
>> > >>> >>>>>>> as
>> > >>> >>>>>>>
>> > >>> >>>>>>>> soon
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may
>> break
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> the
>> > >>> >>>
>> > >>> >>>> assumption
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> that the replacement for the bypassed code takes care of
>> all
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> necessary
>> > >>> >>>>>>>
>> > >>> >>>>>>>> skipped concerns. Because those concerns can change at any
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> point,
>> > >>> >>>>>
>> > >>> >>>>>> such an
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> assumption is never safe.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>>> I say "in theory" because I would be surprised if
>> anyone is
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> relying
>> > >>> >>>>>>
>> > >>> >>>>>>> on
>> > >>> >>>>>>>
>> > >>> >>>>>>>> the
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> bypass for the above reason. I seem to recall that
>> Phoenix
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> might
>> > >>> >>>>
>> > >>> >>>>> use
>> > >>> >>>>>>
>> > >>> >>>>>>> it
>> > >>> >>>>>>>>
>> > >>> >>>>>>>>> in
>> > >>> >>>>>>>>>>
>> > >>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> operation,
>> > >>> >>>>
>> > >>> >>>>> by
>> > >>> >>>>>>
>> > >>> >>>>>>> substituting one for the other, but if so that objective
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>> could
>> > >>> >>>
>> > >>> >>>> be
>> > >>> >>>>>
>> > >>> >>>>>> reimplemented using their new locking manager.
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>>>
>> > >>> >>>>>>>>>>>
>> > >>> >>>>
>> > >>> >>>
>> > >>> >>
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Andrew
>>
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>    - A23, Crosstalk
>>
>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

On Mon, Oct 30, 2017 at 1:45 PM, Andrew Purtell <ap...@apache.org> wrote:

> I think complete and bypass are separate considerations and complete can be
> used universally while we've decided to make bypass work only in some
> contexts.
>
> That said, we can consider removing the complete semantic. Let's pose the
> same question we did about bypass. Does anyone use it? Can we live without
> it? As you point out, security interrupts processing by throwing an
> exception, which is meant to propagate all the way back to the user. It
> simplifies the theory of operation for coprocessors if we can assume either
> the entire chain will complete or one of the coprocessors in the chain will
> throw an exception that not only terminates processing of the rest of the
> chain but also the operation in progress.
>
>

I'd be game for removing 'complete'.

I can leave this question hang a while. Purge would be internals and
javadoc changes so it can happen after alpha-4 (HBASE-19123 is the
issue-to-purge).

Thanks,
S





>
> On Mon, Oct 30, 2017 at 10:46 AM, Stack <st...@duboce.net> wrote:
>
> > HBASE-18770 (bypass) is coming along (thanks for the helpful reviews so
> > far!).
> >
> > Of note, I have changed the Coprocessor Observer 'complete' function so
> it
> > is only available on 'bypassable' methods. Is this ok to do (I'm no
> expert
> > on coprocessoring)? I do it in in the name of KISS. Having any
> Coprocessor
> > being able to 'complete' overriding any Coprocessor that comes behind it
> in
> > the processing chain seems obnoxious. I can see the need if a Coprocessor
> > is bypasable and has conjured an answer it wants to be back to the client
> > without tainting by subsequent Coprocessors -- which seems to be how it
> is
> > used in my survey of Coprocessor implementations -- but perhaps I am
> > missing a use case? (AccessController throws an exception when access is
> > denied). Downside of supporting 'complete' globally is more overrides
> > internally and messaging gets a bit more muddled.
> >
> > Here is more on 'complete' in case you don't know what it is about. If a
> > method's 'pre' hook is wrapped by 10 Coprocessor observers, each observer
> > gets called one after the other before we go ahead and do the actual
> > method invocation. If the first Coprocessor in the chain calls 'complete'
> > in its context, we will skip calling the remaining 9 coprocessors and
> then
> > go ahead and make the method invocation.
> >
> > Any opinions out there on 'complete'? Any objections to my only allowing
> > 'complete' on bypassable methods?
> >
> > Thanks,
> > St.Ack
> >
> >
> >
> > On Tue, Oct 24, 2017 at 9:53 PM, Stack <st...@duboce.net> wrote:
> >
> > > I made a start on HBASE-18770. It has edit of RegionObserver which
> > denotes
> > > methods that support bypass (Unfortunately, because of the varied
> > > signatures, how bypass is signaled varies too). Would appreciate a
> > > once-over.
> > >
> > > of note, a CP cannot bypass flush. Speak up if you think otherwise (or
> > you
> > > can think of a case where this needed). My rationale is CPs won't have
> > > enough insider knowledge to do memory accounting in a world of
> in-memory
> > > compactions, and on/offheap memory in our hosting process. What ye
> > reckon?
> > >
> > > Coprocessors have always been able to adjust what gets compacted in any
> > > run and even skirt compaction altogether by returning an empty set of
> > files
> > > to compact. This works as it ever did.
> > >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
> > >
> > >> I was going to pick up on the bypass after HBASE-19007 lands, cleaning
> > up
> > >> our exposure of Master/RegionServerServices to Coprocessors
> (HBASE-19007
> > >> was going bad for a good while but lots of contributors and good
> > discussion
> > >> and now I think we have it). Shouldn't be too much longer.
> > >>
> > >> Its CP API so I was figuring it an alpha-4 item.
> > >>
> > >> St.Ack
> > >>
> > >> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> > >> wrote:
> > >>
> > >>> Fine. Let me change the title of HBASE-18770 and prepare a patch
> there.
> > >>>
> > >>> May still a week or two before alpha4 I think. The scan injection,
> and
> > >>> flush/compaction trigger/track API is still unstable...
> > >>>
> > >>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
> > >>>
> > >>> > (catching up here)
> > >>> >
> > >>> > I'm glad to see you fine folks came to a conclusion around a
> > >>> reduced-scope
> > >>> > solution (correct me if I'm wrong). "Some" bypass mechanism would
> > stay
> > >>> for
> > >>> > preXXX methods, and we'd remove it for the other methods? What
> > exactly
> > >>> the
> > >>> > "bypass API" would be is up in the air, correct?
> > >>> >
> > >>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
> > >>> > discussion appears to have died down?
> > >>> >
> > >>> > I was originally lamenting yet another big, sweeping change to CPs
> > >>> when I
> > >>> > had expected alpha-4 to have already landed. But, let me play
> devil's
> > >>> > advocate: is this something we still think is critical to do in
> > >>> alpha-4? I
> > >>> > can respect wanting to get address all of these smells, but I'd be
> > >>> worry it
> > >>> > delays us further.
> > >>> >
> > >>> >
> > >>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
> > >>> >
> > >>> >> Creating an exception is expensive so if it is not suggested to do
> > it
> > >>> in a
> > >>> >> normal case. A common trick is to create a global exception
> > instance,
> > >>> and
> > >>> >> always throw it to avoid creating every time but I think it is
> more
> > >>> >> friendly to just use a return value?
> > >>> >>
> > >>> >> And for me, the bypass after preXXX for normal region operations
> > just
> > >>> >> equals to a 'cancel', which is very clear and easy to understand,
> > so I
> > >>> >> think it is OK to add bypass support for them. And also for
> > >>> compaction and
> > >>> >> flush, it is OK to give CP users the ability to cancel the
> operation
> > >>> as
> > >>> >> the
> > >>> >> semantic is clear, although I'm not sure how CP users would use
> this
> > >>> >> feature.
> > >>> >>
> > >>> >> In general, I think we can provide bypass/cancel support in preXXX
> > >>> methods
> > >>> >> where it is the very beginning of an operation.
> > >>> >>
> > >>> >> Thanks.
> > >>> >>
> > >>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
> > >>> >>
> > >>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
> > to
> > >>> use
> > >>> >>>>
> > >>> >>> its long encoding writing Increments. Not sure how we'd do that,
> > >>> >>> selectively.
> > >>> >>>
> > >>> >>> If we can handle the rest of the trouble that you observed:
> > >>> >>>
> > >>> >>> 1) Lack of recognition and identification of when the key value
> to
> > >>> >>> increment doesn't exist
> > >>> >>> 2) Lack of the ability to set the timestamp of the updated key
> > value.
> > >>> >>>
> > >>> >>> then they might be able to make it work. Perhaps a conversion
> from
> > >>> HBase
> > >>> >>> native to Phoenix LONG encoding when processing results, in the
> > >>> wrapping
> > >>> >>> scanner, informed by schema metadata.
> > >>> >>>
> > >>> >>> Or if we are keeping the bypass semantic in select places but
> > >>> >>> implementing
> > >>> >>> it with something other than today's bypass() API (please) this
> > >>> would be
> > >>> >>> another candidate for where to keep it. Duo suggests keeping the
> > >>> semantic
> > >>> >>> in all of the basic RPC preXXX hooks for query and mutation. We
> > could
> > >>> >>> redo
> > >>> >>> those APIs to skip normal processing based on a return value or
> > >>> exception
> > >>> >>> but otherwise drop bypass from all the others. It will clean up
> > >>> areas of
> > >>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what
> > about
> > >>> this
> > >>> >>> arcane hook in compaction? Or [insert some deep hook here]? The
> > >>> answer
> > >>> >>> would be: only RPC hooks will early out, and only if you return
> > this
> > >>> >>> value,
> > >>> >>> or throw that exception.
> > >>> >>>
> > >>> >>>
> > >>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net>
> wrote:
> > >>> >>>
> > >>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does
> bypass
> > >>> when
> > >>> >>>> user does a Get returning instead the result of its own (Flow)
> > Scan
> > >>> >>>>
> > >>> >>> result.
> > >>> >>>
> > >>> >>>> Not sure how we'd do alternative here; Timeline Server is
> keeping
> > >>> Tags
> > >>> >>>> internally.
> > >>> >>>>
> > >>> >>>>
> > >>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
> > >>> apurtell@apache.org>
> > >>> >>>> wrote:
> > >>> >>>>
> > >>> >>>> Rather than continue to support a weird bypass() which works in
> > some
> > >>> >>>>>
> > >>> >>>> places
> > >>> >>>>
> > >>> >>>>> and not in others, perhaps we can substitute it with an
> > exception?
> > >>> So
> > >>> >>>>>
> > >>> >>>> if
> > >>> >>>
> > >>> >>>> the coprocessor throws this exception in the pre hook then where
> > it
> > >>> is
> > >>> >>>>> allowed we catch it and do the right thing, and where it is not
> > >>> allowed
> > >>> >>>>>
> > >>> >>>> we
> > >>> >>>>
> > >>> >>>>> don't catch it and the server aborts. This will at least
> improve
> > >>> the
> > >>> >>>>>
> > >>> >>>> silent
> > >>> >>>>
> > >>> >>>>> bypass() failure problem. I also don't like, in retrospect,
> that
> > >>> >>>>>
> > >>> >>>> calling
> > >>> >>>
> > >>> >>>> this environment method has magic side effects. Everyone
> > understands
> > >>> >>>>>
> > >>> >>>> how
> > >>> >>>
> > >>> >>>> exceptions work, so it will be clearer.
> > >>> >>>>>
> > >>> >>>>>
> > >>> >>>>> We could do that though throw and catch of exceptions would be
> > >>> costly.
> > >>> >>>>
> > >>> >>>> What about the Duo suggestion? Purge bypass flag and replace it
> w/
> > >>> >>>> preXXX
> > >>> >>>> in a few select methods returning a boolean on whether bypass?
> > Would
> > >>> >>>> that
> > >>> >>>> work? (Would have to figure metrics still).
> > >>> >>>>
> > >>> >>>>
> > >>> >>>>
> > >>> >>>> In any case we should try to address the Tephra and Phoenix
> cases
> > >>> >>>>>
> > >>> >>>> brought
> > >>> >>>
> > >>> >>>> up in this discussion. They look like we can find alternatives.
> > >>> Shall I
> > >>> >>>>> file JIRAs to follow up?
> > >>> >>>>>
> > >>> >>>>>
> > >>> >>>>>
> > >>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix
> > wants
> > >>> to
> > >>> >>>> use
> > >>> >>>> its long encoding writing Increments. Not sure how we'd do that,
> > >>> >>>> selectively.
> > >>> >>>>
> > >>> >>>> St.Ack
> > >>> >>>>
> > >>> >>>>
> > >>> >>>>
> > >>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
> > >>> palomino219@gmail.com>
> > >>> >>>>> wrote:
> > >>> >>>>>
> > >>> >>>>> These examples are great.
> > >>> >>>>>>
> > >>> >>>>>> And I think for normal region operations such as get, put,
> > delete,
> > >>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation
> > >>> after
> > >>> >>>>>>
> > >>> >>>>> preXXX
> > >>> >>>>>
> > >>> >>>>>> as the semantic is clear enough. Instead of calling
> env.bypass,
> > >>> maybe
> > >>> >>>>>>
> > >>> >>>>> just
> > >>> >>>>>
> > >>> >>>>>> let these preXXX methods return a boolean is enough to tell
> the
> > >>> HBase
> > >>> >>>>>> framework that we have already done the real operation so just
> > >>> give
> > >>> >>>>>>
> > >>> >>>>> up
> > >>> >>>
> > >>> >>>> and
> > >>> >>>>>
> > >>> >>>>>> return?
> > >>> >>>>>>
> > >>> >>>>>> Thanks.
> > >>> >>>>>>
> > >>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <ghelmling@gmail.com
> >:
> > >>> >>>>>>
> > >>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
> > >>> >>>>>>>
> > >>> >>>>>> preDelete()
> > >>> >>>>
> > >>> >>>>> to
> > >>> >>>>>>
> > >>> >>>>>>> override handling of delete tombstones in a transactional
> way:
> > >>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
> > >>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > >>> >>>>>>>
> > >>> >>>>>> hbase/coprocessor/
> > >>> >>>>>>
> > >>> >>>>>>> TransactionProcessor.java#L244
> > >>> >>>>>>>
> > >>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
> > >>> >>>>>>>
> > >>> >>>>>> preGetOp()
> > >>> >>>
> > >>> >>>> and
> > >>> >>>>>
> > >>> >>>>>> preIncrementAfterRRowLock() to provide a transaction
> > >>> implementation
> > >>> >>>>>>>
> > >>> >>>>>> of
> > >>> >>>>
> > >>> >>>>> readless increments:
> > >>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > >>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > >>> >>>>>>> hbase11/IncrementHandler.java#L121
> > >>> >>>>>>>
> > >>> >>>>>>> What would be the alternate approach for these applications?
> > In
> > >>> >>>>>>>
> > >>> >>>>>> both
> > >>> >>>
> > >>> >>>> cases
> > >>> >>>>>>
> > >>> >>>>>>> they need to impose their own semantics on the underlying
> > >>> KeyValue
> > >>> >>>>>>> storage.  Is there a different way this can be done?
> > >>> >>>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
> > >>> anoop.hbase@gmail.com
> > >>> >>>>>>>
> > >>> >>>>>>
> > >>> >>>> wrote:
> > >>> >>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>> Wrap core scanners is different right?  That can be done in
> > post
> > >>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the
> > question
> > >>> >>>>>>>>
> > >>> >>>>>>> abt
> > >>> >>>
> > >>> >>>> the pre hooks where we have not yet created the core object
> (like
> > >>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
> > >>> >>>>>>>>
> > >>> >>>>>>> creation
> > >>> >>>
> > >>> >>>> and so the core code is been bypassed.    Well the wrapping
> thing
> > >>> >>>>>>>>
> > >>> >>>>>>> can
> > >>> >>>>
> > >>> >>>>> be done in pre hook also. First create the core object by CP
> code
> > >>> >>>>>>>> itself and then do the wrapped object and return.. I have
> seen
> > >>> in
> > >>> >>>>>>>>
> > >>> >>>>>>> one
> > >>> >>>>
> > >>> >>>>> jira issue where the usage was this way..   The wrapping can be
> > >>> >>>>>>>>
> > >>> >>>>>>> done
> > >>> >>>>
> > >>> >>>>> in post also in such cases I believe.
> > >>> >>>>>>>>
> > >>> >>>>>>>> -Anoop-
> > >>> >>>>>>>>
> > >>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> > >>> >>>>>>>>
> > >>> >>>>>>> apurtell@apache.org>
> > >>> >>>>>
> > >>> >>>>>> wrote:
> > >>> >>>>>>>>
> > >>> >>>>>>>>> I think we should continue to support overriding function
> by
> > >>> >>>>>>>>>
> > >>> >>>>>>>> object
> > >>> >>>>
> > >>> >>>>> inheritance. I didn't mention this and am not proposing more
> > >>> >>>>>>>>>
> > >>> >>>>>>>> than
> > >>> >>>
> > >>> >>>> removing
> > >>> >>>>>>>>
> > >>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
> > >>> >>>>>>>>>
> > >>> >>>>>>>> depends
> > >>> >>>
> > >>> >>>> on
> > >>> >>>>>
> > >>> >>>>>> being
> > >>> >>>>>>>>
> > >>> >>>>>>>>> able to wrap core scanners and return the wrappers.
> > >>> >>>>>>>>>
> > >>> >>>>>>>>>
> > >>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> > >>> >>>>>>>>>
> > >>> >>>>>>>> anoop.hbase@gmail.com>
> > >>> >>>>>
> > >>> >>>>>> wrote:
> > >>> >>>>>>>>
> > >>> >>>>>>>>>
> > >>> >>>>>>>>> When we say bypass the core code, it can be done today not
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> only
> > >>> >>>
> > >>> >>>> by
> > >>> >>>>
> > >>> >>>>> calling bypass but by returning a not null object for some of
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> the
> > >>> >>>>
> > >>> >>>>> pre
> > >>> >>>>>>
> > >>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> we
> > >>> >>>
> > >>> >>>> will
> > >>> >>>>>
> > >>> >>>>>> avoid the remaining core code execution for creation of the
> > >>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> remove
> > >>> >>>>
> > >>> >>>>> any
> > >>> >>>>>>
> > >>> >>>>>>> possible way of bypassing the core code by the CP hook code
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> execution
> > >>> >>>>>>
> > >>> >>>>>>> ?   Am +1.
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>> -Anoop-
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>> apurtell@apache.org
> > >>> >>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>>> wrote:
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> The coprocessor API provides an environment method,
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> bypass(),
> > >>> >>>
> > >>> >>>> that
> > >>> >>>>>
> > >>> >>>>>> when
> > >>> >>>>>>>>
> > >>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> all
> > >>> >>>
> > >>> >>>> remaining
> > >>> >>>>>>>>
> > >>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> Since
> > >>> >>>>
> > >>> >>>>> this
> > >>> >>>>>>
> > >>> >>>>>>> time I
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> think we are more enlightened about the complications of
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> this
> > >>> >>>
> > >>> >>>> feature.
> > >>> >>>>>>>
> > >>> >>>>>>>> (Or,
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> anyway, speaking for myself:)
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is
> > the
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> case
> > >>> >>>>>
> > >>> >>>>>> the
> > >>> >>>>>>>
> > >>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> call
> > >>> >>>>
> > >>> >>>>> bypass()
> > >>> >>>>>>>>
> > >>> >>>>>>>>> in
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> lead
> > >>> >>>
> > >>> >>>> to a
> > >>> >>>>>
> > >>> >>>>>> poor
> > >>> >>>>>>>>
> > >>> >>>>>>>>> developer experience.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all
> of
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> the
> > >>> >>>
> > >>> >>>> core
> > >>> >>>>>>
> > >>> >>>>>>> code
> > >>> >>>>>>>>
> > >>> >>>>>>>>> implementing the remainder of the operation. In order to
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> understand
> > >>> >>>>>>
> > >>> >>>>>>> what
> > >>> >>>>>>>>
> > >>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer
> should
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> read
> > >>> >>>>>
> > >>> >>>>>> and
> > >>> >>>>>>>
> > >>> >>>>>>>> understand all of the remaining code and its nuances.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> Although I
> > >>> >>>>
> > >>> >>>>> think
> > >>> >>>>>>>
> > >>> >>>>>>>> this
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> is good practice for coprocessor developers in general,
> it
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> demands a
> > >>> >>>>>>
> > >>> >>>>>>> lot. I
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> think it would provide a much better developer experience
> > if
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> we
> > >>> >>>>
> > >>> >>>>> didn't
> > >>> >>>>>>>
> > >>> >>>>>>>> allow bypass, even though it means - in theory - a
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> coprocessor
> > >>> >>>
> > >>> >>>> would
> > >>> >>>>>>
> > >>> >>>>>>> be a
> > >>> >>>>>>>>
> > >>> >>>>>>>>> lot more limited in some ways than before. What is skipped
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> is
> > >>> >>>
> > >>> >>>> extremely
> > >>> >>>>>>>>
> > >>> >>>>>>>>> version dependent. That core code will vary, perhaps
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> significantly,
> > >>> >>>>>>
> > >>> >>>>>>> even
> > >>> >>>>>>>>
> > >>> >>>>>>>>> between point releases. We do not provide the promise of
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> consistent
> > >>> >>>>>>
> > >>> >>>>>>> behavior even between point releases for the bypass
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> semantic.
> > >>> >>>
> > >>> >>>> To
> > >>> >>>>
> > >>> >>>>> achieve
> > >>> >>>>>>>>
> > >>> >>>>>>>>> that we could not change any code between hook points.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> Therefore
> > >>> >>>>
> > >>> >>>>> the
> > >>> >>>>>>
> > >>> >>>>>>> coprocessor implementer becomes an HBase core developer in
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> practice
> > >>> >>>>>>
> > >>> >>>>>>> as
> > >>> >>>>>>>
> > >>> >>>>>>>> soon
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may
> break
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> the
> > >>> >>>
> > >>> >>>> assumption
> > >>> >>>>>>>>
> > >>> >>>>>>>>> that the replacement for the bypassed code takes care of
> all
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> necessary
> > >>> >>>>>>>
> > >>> >>>>>>>> skipped concerns. Because those concerns can change at any
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> point,
> > >>> >>>>>
> > >>> >>>>>> such an
> > >>> >>>>>>>>
> > >>> >>>>>>>>> assumption is never safe.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone
> is
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> relying
> > >>> >>>>>>
> > >>> >>>>>>> on
> > >>> >>>>>>>
> > >>> >>>>>>>> the
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> bypass for the above reason. I seem to recall that
> Phoenix
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> might
> > >>> >>>>
> > >>> >>>>> use
> > >>> >>>>>>
> > >>> >>>>>>> it
> > >>> >>>>>>>>
> > >>> >>>>>>>>> in
> > >>> >>>>>>>>>>
> > >>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> operation,
> > >>> >>>>
> > >>> >>>>> by
> > >>> >>>>>>
> > >>> >>>>>>> substituting one for the other, but if so that objective
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>> could
> > >>> >>>
> > >>> >>>> be
> > >>> >>>>>
> > >>> >>>>>> reimplemented using their new locking manager.
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>>>
> > >>> >>>>>>>>>>>
> > >>> >>>>
> > >>> >>>
> > >>> >>
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

I think complete and bypass are separate considerations and complete can be
used universally while we've decided to make bypass work only in some
contexts.

That said, we can consider removing the complete semantic. Let's pose the
same question we did about bypass. Does anyone use it? Can we live without
it? As you point out, security interrupts processing by throwing an
exception, which is meant to propagate all the way back to the user. It
simplifies the theory of operation for coprocessors if we can assume either
the entire chain will complete or one of the coprocessors in the chain will
throw an exception that not only terminates processing of the rest of the
chain but also the operation in progress.


On Mon, Oct 30, 2017 at 10:46 AM, Stack <st...@duboce.net> wrote:

> HBASE-18770 (bypass) is coming along (thanks for the helpful reviews so
> far!).
>
> Of note, I have changed the Coprocessor Observer 'complete' function so it
> is only available on 'bypassable' methods. Is this ok to do (I'm no expert
> on coprocessoring)? I do it in in the name of KISS. Having any Coprocessor
> being able to 'complete' overriding any Coprocessor that comes behind it in
> the processing chain seems obnoxious. I can see the need if a Coprocessor
> is bypasable and has conjured an answer it wants to be back to the client
> without tainting by subsequent Coprocessors -- which seems to be how it is
> used in my survey of Coprocessor implementations -- but perhaps I am
> missing a use case? (AccessController throws an exception when access is
> denied). Downside of supporting 'complete' globally is more overrides
> internally and messaging gets a bit more muddled.
>
> Here is more on 'complete' in case you don't know what it is about. If a
> method's 'pre' hook is wrapped by 10 Coprocessor observers, each observer
> gets called one after the other before we go ahead and do the actual
> method invocation. If the first Coprocessor in the chain calls 'complete'
> in its context, we will skip calling the remaining 9 coprocessors and then
> go ahead and make the method invocation.
>
> Any opinions out there on 'complete'? Any objections to my only allowing
> 'complete' on bypassable methods?
>
> Thanks,
> St.Ack
>
>
>
> On Tue, Oct 24, 2017 at 9:53 PM, Stack <st...@duboce.net> wrote:
>
> > I made a start on HBASE-18770. It has edit of RegionObserver which
> denotes
> > methods that support bypass (Unfortunately, because of the varied
> > signatures, how bypass is signaled varies too). Would appreciate a
> > once-over.
> >
> > of note, a CP cannot bypass flush. Speak up if you think otherwise (or
> you
> > can think of a case where this needed). My rationale is CPs won't have
> > enough insider knowledge to do memory accounting in a world of in-memory
> > compactions, and on/offheap memory in our hosting process. What ye
> reckon?
> >
> > Coprocessors have always been able to adjust what gets compacted in any
> > run and even skirt compaction altogether by returning an empty set of
> files
> > to compact. This works as it ever did.
> >
> > Thanks,
> > S
> >
> >
> >
> > On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
> >
> >> I was going to pick up on the bypass after HBASE-19007 lands, cleaning
> up
> >> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
> >> was going bad for a good while but lots of contributors and good
> discussion
> >> and now I think we have it). Shouldn't be too much longer.
> >>
> >> Its CP API so I was figuring it an alpha-4 item.
> >>
> >> St.Ack
> >>
> >> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> >> wrote:
> >>
> >>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
> >>>
> >>> May still a week or two before alpha4 I think. The scan injection, and
> >>> flush/compaction trigger/track API is still unstable...
> >>>
> >>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
> >>>
> >>> > (catching up here)
> >>> >
> >>> > I'm glad to see you fine folks came to a conclusion around a
> >>> reduced-scope
> >>> > solution (correct me if I'm wrong). "Some" bypass mechanism would
> stay
> >>> for
> >>> > preXXX methods, and we'd remove it for the other methods? What
> exactly
> >>> the
> >>> > "bypass API" would be is up in the air, correct?
> >>> >
> >>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
> >>> > discussion appears to have died down?
> >>> >
> >>> > I was originally lamenting yet another big, sweeping change to CPs
> >>> when I
> >>> > had expected alpha-4 to have already landed. But, let me play devil's
> >>> > advocate: is this something we still think is critical to do in
> >>> alpha-4? I
> >>> > can respect wanting to get address all of these smells, but I'd be
> >>> worry it
> >>> > delays us further.
> >>> >
> >>> >
> >>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
> >>> >
> >>> >> Creating an exception is expensive so if it is not suggested to do
> it
> >>> in a
> >>> >> normal case. A common trick is to create a global exception
> instance,
> >>> and
> >>> >> always throw it to avoid creating every time but I think it is more
> >>> >> friendly to just use a return value?
> >>> >>
> >>> >> And for me, the bypass after preXXX for normal region operations
> just
> >>> >> equals to a 'cancel', which is very clear and easy to understand,
> so I
> >>> >> think it is OK to add bypass support for them. And also for
> >>> compaction and
> >>> >> flush, it is OK to give CP users the ability to cancel the operation
> >>> as
> >>> >> the
> >>> >> semantic is clear, although I'm not sure how CP users would use this
> >>> >> feature.
> >>> >>
> >>> >> In general, I think we can provide bypass/cancel support in preXXX
> >>> methods
> >>> >> where it is the very beginning of an operation.
> >>> >>
> >>> >> Thanks.
> >>> >>
> >>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
> >>> >>
> >>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
> to
> >>> use
> >>> >>>>
> >>> >>> its long encoding writing Increments. Not sure how we'd do that,
> >>> >>> selectively.
> >>> >>>
> >>> >>> If we can handle the rest of the trouble that you observed:
> >>> >>>
> >>> >>> 1) Lack of recognition and identification of when the key value to
> >>> >>> increment doesn't exist
> >>> >>> 2) Lack of the ability to set the timestamp of the updated key
> value.
> >>> >>>
> >>> >>> then they might be able to make it work. Perhaps a conversion from
> >>> HBase
> >>> >>> native to Phoenix LONG encoding when processing results, in the
> >>> wrapping
> >>> >>> scanner, informed by schema metadata.
> >>> >>>
> >>> >>> Or if we are keeping the bypass semantic in select places but
> >>> >>> implementing
> >>> >>> it with something other than today's bypass() API (please) this
> >>> would be
> >>> >>> another candidate for where to keep it. Duo suggests keeping the
> >>> semantic
> >>> >>> in all of the basic RPC preXXX hooks for query and mutation. We
> could
> >>> >>> redo
> >>> >>> those APIs to skip normal processing based on a return value or
> >>> exception
> >>> >>> but otherwise drop bypass from all the others. It will clean up
> >>> areas of
> >>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what
> about
> >>> this
> >>> >>> arcane hook in compaction? Or [insert some deep hook here]? The
> >>> answer
> >>> >>> would be: only RPC hooks will early out, and only if you return
> this
> >>> >>> value,
> >>> >>> or throw that exception.
> >>> >>>
> >>> >>>
> >>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
> >>> >>>
> >>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
> >>> when
> >>> >>>> user does a Get returning instead the result of its own (Flow)
> Scan
> >>> >>>>
> >>> >>> result.
> >>> >>>
> >>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping
> >>> Tags
> >>> >>>> internally.
> >>> >>>>
> >>> >>>>
> >>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
> >>> apurtell@apache.org>
> >>> >>>> wrote:
> >>> >>>>
> >>> >>>> Rather than continue to support a weird bypass() which works in
> some
> >>> >>>>>
> >>> >>>> places
> >>> >>>>
> >>> >>>>> and not in others, perhaps we can substitute it with an
> exception?
> >>> So
> >>> >>>>>
> >>> >>>> if
> >>> >>>
> >>> >>>> the coprocessor throws this exception in the pre hook then where
> it
> >>> is
> >>> >>>>> allowed we catch it and do the right thing, and where it is not
> >>> allowed
> >>> >>>>>
> >>> >>>> we
> >>> >>>>
> >>> >>>>> don't catch it and the server aborts. This will at least improve
> >>> the
> >>> >>>>>
> >>> >>>> silent
> >>> >>>>
> >>> >>>>> bypass() failure problem. I also don't like, in retrospect, that
> >>> >>>>>
> >>> >>>> calling
> >>> >>>
> >>> >>>> this environment method has magic side effects. Everyone
> understands
> >>> >>>>>
> >>> >>>> how
> >>> >>>
> >>> >>>> exceptions work, so it will be clearer.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> We could do that though throw and catch of exceptions would be
> >>> costly.
> >>> >>>>
> >>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
> >>> >>>> preXXX
> >>> >>>> in a few select methods returning a boolean on whether bypass?
> Would
> >>> >>>> that
> >>> >>>> work? (Would have to figure metrics still).
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> In any case we should try to address the Tephra and Phoenix cases
> >>> >>>>>
> >>> >>>> brought
> >>> >>>
> >>> >>>> up in this discussion. They look like we can find alternatives.
> >>> Shall I
> >>> >>>>> file JIRAs to follow up?
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix
> wants
> >>> to
> >>> >>>> use
> >>> >>>> its long encoding writing Increments. Not sure how we'd do that,
> >>> >>>> selectively.
> >>> >>>>
> >>> >>>> St.Ack
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
> >>> palomino219@gmail.com>
> >>> >>>>> wrote:
> >>> >>>>>
> >>> >>>>> These examples are great.
> >>> >>>>>>
> >>> >>>>>> And I think for normal region operations such as get, put,
> delete,
> >>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation
> >>> after
> >>> >>>>>>
> >>> >>>>> preXXX
> >>> >>>>>
> >>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
> >>> maybe
> >>> >>>>>>
> >>> >>>>> just
> >>> >>>>>
> >>> >>>>>> let these preXXX methods return a boolean is enough to tell the
> >>> HBase
> >>> >>>>>> framework that we have already done the real operation so just
> >>> give
> >>> >>>>>>
> >>> >>>>> up
> >>> >>>
> >>> >>>> and
> >>> >>>>>
> >>> >>>>>> return?
> >>> >>>>>>
> >>> >>>>>> Thanks.
> >>> >>>>>>
> >>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> >>> >>>>>>
> >>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
> >>> >>>>>>>
> >>> >>>>>> preDelete()
> >>> >>>>
> >>> >>>>> to
> >>> >>>>>>
> >>> >>>>>>> override handling of delete tombstones in a transactional way:
> >>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
> >>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> >>> >>>>>>>
> >>> >>>>>> hbase/coprocessor/
> >>> >>>>>>
> >>> >>>>>>> TransactionProcessor.java#L244
> >>> >>>>>>>
> >>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
> >>> >>>>>>>
> >>> >>>>>> preGetOp()
> >>> >>>
> >>> >>>> and
> >>> >>>>>
> >>> >>>>>> preIncrementAfterRRowLock() to provide a transaction
> >>> implementation
> >>> >>>>>>>
> >>> >>>>>> of
> >>> >>>>
> >>> >>>>> readless increments:
> >>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> >>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> >>> >>>>>>> hbase11/IncrementHandler.java#L121
> >>> >>>>>>>
> >>> >>>>>>> What would be the alternate approach for these applications?
> In
> >>> >>>>>>>
> >>> >>>>>> both
> >>> >>>
> >>> >>>> cases
> >>> >>>>>>
> >>> >>>>>>> they need to impose their own semantics on the underlying
> >>> KeyValue
> >>> >>>>>>> storage.  Is there a different way this can be done?
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
> >>> anoop.hbase@gmail.com
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>> wrote:
> >>> >>>>>>
> >>> >>>>>>>
> >>> >>>>>>> Wrap core scanners is different right?  That can be done in
> post
> >>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the
> question
> >>> >>>>>>>>
> >>> >>>>>>> abt
> >>> >>>
> >>> >>>> the pre hooks where we have not yet created the core object (like
> >>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
> >>> >>>>>>>>
> >>> >>>>>>> creation
> >>> >>>
> >>> >>>> and so the core code is been bypassed.    Well the wrapping thing
> >>> >>>>>>>>
> >>> >>>>>>> can
> >>> >>>>
> >>> >>>>> be done in pre hook also. First create the core object by CP code
> >>> >>>>>>>> itself and then do the wrapped object and return.. I have seen
> >>> in
> >>> >>>>>>>>
> >>> >>>>>>> one
> >>> >>>>
> >>> >>>>> jira issue where the usage was this way..   The wrapping can be
> >>> >>>>>>>>
> >>> >>>>>>> done
> >>> >>>>
> >>> >>>>> in post also in such cases I believe.
> >>> >>>>>>>>
> >>> >>>>>>>> -Anoop-
> >>> >>>>>>>>
> >>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> >>> >>>>>>>>
> >>> >>>>>>> apurtell@apache.org>
> >>> >>>>>
> >>> >>>>>> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>>> I think we should continue to support overriding function by
> >>> >>>>>>>>>
> >>> >>>>>>>> object
> >>> >>>>
> >>> >>>>> inheritance. I didn't mention this and am not proposing more
> >>> >>>>>>>>>
> >>> >>>>>>>> than
> >>> >>>
> >>> >>>> removing
> >>> >>>>>>>>
> >>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
> >>> >>>>>>>>>
> >>> >>>>>>>> depends
> >>> >>>
> >>> >>>> on
> >>> >>>>>
> >>> >>>>>> being
> >>> >>>>>>>>
> >>> >>>>>>>>> able to wrap core scanners and return the wrappers.
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> >>> >>>>>>>>>
> >>> >>>>>>>> anoop.hbase@gmail.com>
> >>> >>>>>
> >>> >>>>>> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> When we say bypass the core code, it can be done today not
> >>> >>>>>>>>>>
> >>> >>>>>>>>> only
> >>> >>>
> >>> >>>> by
> >>> >>>>
> >>> >>>>> calling bypass but by returning a not null object for some of
> >>> >>>>>>>>>>
> >>> >>>>>>>>> the
> >>> >>>>
> >>> >>>>> pre
> >>> >>>>>>
> >>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
> >>> >>>>>>>>>>
> >>> >>>>>>>>> we
> >>> >>>
> >>> >>>> will
> >>> >>>>>
> >>> >>>>>> avoid the remaining core code execution for creation of the
> >>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
> >>> >>>>>>>>>>
> >>> >>>>>>>>> remove
> >>> >>>>
> >>> >>>>> any
> >>> >>>>>>
> >>> >>>>>>> possible way of bypassing the core code by the CP hook code
> >>> >>>>>>>>>>
> >>> >>>>>>>>> execution
> >>> >>>>>>
> >>> >>>>>>> ?   Am +1.
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> -Anoop-
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> >>> >>>>>>>>>>
> >>> >>>>>>>>> apurtell@apache.org
> >>> >>>>>>
> >>> >>>>>>>
> >>> >>>>>>>> wrote:
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> The coprocessor API provides an environment method,
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> bypass(),
> >>> >>>
> >>> >>>> that
> >>> >>>>>
> >>> >>>>>> when
> >>> >>>>>>>>
> >>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> all
> >>> >>>
> >>> >>>> remaining
> >>> >>>>>>>>
> >>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> Since
> >>> >>>>
> >>> >>>>> this
> >>> >>>>>>
> >>> >>>>>>> time I
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> think we are more enlightened about the complications of
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> this
> >>> >>>
> >>> >>>> feature.
> >>> >>>>>>>
> >>> >>>>>>>> (Or,
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> anyway, speaking for myself:)
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is
> the
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> case
> >>> >>>>>
> >>> >>>>>> the
> >>> >>>>>>>
> >>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> call
> >>> >>>>
> >>> >>>>> bypass()
> >>> >>>>>>>>
> >>> >>>>>>>>> in
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> lead
> >>> >>>
> >>> >>>> to a
> >>> >>>>>
> >>> >>>>>> poor
> >>> >>>>>>>>
> >>> >>>>>>>>> developer experience.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> the
> >>> >>>
> >>> >>>> core
> >>> >>>>>>
> >>> >>>>>>> code
> >>> >>>>>>>>
> >>> >>>>>>>>> implementing the remainder of the operation. In order to
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> understand
> >>> >>>>>>
> >>> >>>>>>> what
> >>> >>>>>>>>
> >>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> read
> >>> >>>>>
> >>> >>>>>> and
> >>> >>>>>>>
> >>> >>>>>>>> understand all of the remaining code and its nuances.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> Although I
> >>> >>>>
> >>> >>>>> think
> >>> >>>>>>>
> >>> >>>>>>>> this
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> is good practice for coprocessor developers in general, it
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> demands a
> >>> >>>>>>
> >>> >>>>>>> lot. I
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> think it would provide a much better developer experience
> if
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> we
> >>> >>>>
> >>> >>>>> didn't
> >>> >>>>>>>
> >>> >>>>>>>> allow bypass, even though it means - in theory - a
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> coprocessor
> >>> >>>
> >>> >>>> would
> >>> >>>>>>
> >>> >>>>>>> be a
> >>> >>>>>>>>
> >>> >>>>>>>>> lot more limited in some ways than before. What is skipped
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> is
> >>> >>>
> >>> >>>> extremely
> >>> >>>>>>>>
> >>> >>>>>>>>> version dependent. That core code will vary, perhaps
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> significantly,
> >>> >>>>>>
> >>> >>>>>>> even
> >>> >>>>>>>>
> >>> >>>>>>>>> between point releases. We do not provide the promise of
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> consistent
> >>> >>>>>>
> >>> >>>>>>> behavior even between point releases for the bypass
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> semantic.
> >>> >>>
> >>> >>>> To
> >>> >>>>
> >>> >>>>> achieve
> >>> >>>>>>>>
> >>> >>>>>>>>> that we could not change any code between hook points.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> Therefore
> >>> >>>>
> >>> >>>>> the
> >>> >>>>>>
> >>> >>>>>>> coprocessor implementer becomes an HBase core developer in
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> practice
> >>> >>>>>>
> >>> >>>>>>> as
> >>> >>>>>>>
> >>> >>>>>>>> soon
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> the
> >>> >>>
> >>> >>>> assumption
> >>> >>>>>>>>
> >>> >>>>>>>>> that the replacement for the bypassed code takes care of all
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> necessary
> >>> >>>>>>>
> >>> >>>>>>>> skipped concerns. Because those concerns can change at any
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> point,
> >>> >>>>>
> >>> >>>>>> such an
> >>> >>>>>>>>
> >>> >>>>>>>>> assumption is never safe.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> relying
> >>> >>>>>>
> >>> >>>>>>> on
> >>> >>>>>>>
> >>> >>>>>>>> the
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> might
> >>> >>>>
> >>> >>>>> use
> >>> >>>>>>
> >>> >>>>>>> it
> >>> >>>>>>>>
> >>> >>>>>>>>> in
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> operation,
> >>> >>>>
> >>> >>>>> by
> >>> >>>>>>
> >>> >>>>>>> substituting one for the other, but if so that objective
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>> could
> >>> >>>
> >>> >>>> be
> >>> >>>>>
> >>> >>>>>> reimplemented using their new locking manager.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>
> >>> >>>
> >>> >>
> >>>
> >>
> >>
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

HBASE-18770 (bypass) is coming along (thanks for the helpful reviews so
far!).

Of note, I have changed the Coprocessor Observer 'complete' function so it
is only available on 'bypassable' methods. Is this ok to do (I'm no expert
on coprocessoring)? I do it in in the name of KISS. Having any Coprocessor
being able to 'complete' overriding any Coprocessor that comes behind it in
the processing chain seems obnoxious. I can see the need if a Coprocessor
is bypasable and has conjured an answer it wants to be back to the client
without tainting by subsequent Coprocessors -- which seems to be how it is
used in my survey of Coprocessor implementations -- but perhaps I am
missing a use case? (AccessController throws an exception when access is
denied). Downside of supporting 'complete' globally is more overrides
internally and messaging gets a bit more muddled.

Here is more on 'complete' in case you don't know what it is about. If a
method's 'pre' hook is wrapped by 10 Coprocessor observers, each observer
gets called one after the other before we go ahead and do the actual
method invocation. If the first Coprocessor in the chain calls 'complete'
in its context, we will skip calling the remaining 9 coprocessors and then
go ahead and make the method invocation.

Any opinions out there on 'complete'? Any objections to my only allowing
'complete' on bypassable methods?

Thanks,
St.Ack



On Tue, Oct 24, 2017 at 9:53 PM, Stack <st...@duboce.net> wrote:

> I made a start on HBASE-18770. It has edit of RegionObserver which denotes
> methods that support bypass (Unfortunately, because of the varied
> signatures, how bypass is signaled varies too). Would appreciate a
> once-over.
>
> of note, a CP cannot bypass flush. Speak up if you think otherwise (or you
> can think of a case where this needed). My rationale is CPs won't have
> enough insider knowledge to do memory accounting in a world of in-memory
> compactions, and on/offheap memory in our hosting process. What ye reckon?
>
> Coprocessors have always been able to adjust what gets compacted in any
> run and even skirt compaction altogether by returning an empty set of files
> to compact. This works as it ever did.
>
> Thanks,
> S
>
>
>
> On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:
>
>> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
>> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
>> was going bad for a good while but lots of contributors and good discussion
>> and now I think we have it). Shouldn't be too much longer.
>>
>> Its CP API so I was figuring it an alpha-4 item.
>>
>> St.Ack
>>
>> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
>> wrote:
>>
>>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>>
>>> May still a week or two before alpha4 I think. The scan injection, and
>>> flush/compaction trigger/track API is still unstable...
>>>
>>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>>>
>>> > (catching up here)
>>> >
>>> > I'm glad to see you fine folks came to a conclusion around a
>>> reduced-scope
>>> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>>> for
>>> > preXXX methods, and we'd remove it for the other methods? What exactly
>>> the
>>> > "bypass API" would be is up in the air, correct?
>>> >
>>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> > discussion appears to have died down?
>>> >
>>> > I was originally lamenting yet another big, sweeping change to CPs
>>> when I
>>> > had expected alpha-4 to have already landed. But, let me play devil's
>>> > advocate: is this something we still think is critical to do in
>>> alpha-4? I
>>> > can respect wanting to get address all of these smells, but I'd be
>>> worry it
>>> > delays us further.
>>> >
>>> >
>>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>> >
>>> >> Creating an exception is expensive so if it is not suggested to do it
>>> in a
>>> >> normal case. A common trick is to create a global exception instance,
>>> and
>>> >> always throw it to avoid creating every time but I think it is more
>>> >> friendly to just use a return value?
>>> >>
>>> >> And for me, the bypass after preXXX for normal region operations just
>>> >> equals to a 'cancel', which is very clear and easy to understand, so I
>>> >> think it is OK to add bypass support for them. And also for
>>> compaction and
>>> >> flush, it is OK to give CP users the ability to cancel the operation
>>> as
>>> >> the
>>> >> semantic is clear, although I'm not sure how CP users would use this
>>> >> feature.
>>> >>
>>> >> In general, I think we can provide bypass/cancel support in preXXX
>>> methods
>>> >> where it is the very beginning of an operation.
>>> >>
>>> >> Thanks.
>>> >>
>>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>> >>
>>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>> use
>>> >>>>
>>> >>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>> selectively.
>>> >>>
>>> >>> If we can handle the rest of the trouble that you observed:
>>> >>>
>>> >>> 1) Lack of recognition and identification of when the key value to
>>> >>> increment doesn't exist
>>> >>> 2) Lack of the ability to set the timestamp of the updated key value.
>>> >>>
>>> >>> then they might be able to make it work. Perhaps a conversion from
>>> HBase
>>> >>> native to Phoenix LONG encoding when processing results, in the
>>> wrapping
>>> >>> scanner, informed by schema metadata.
>>> >>>
>>> >>> Or if we are keeping the bypass semantic in select places but
>>> >>> implementing
>>> >>> it with something other than today's bypass() API (please) this
>>> would be
>>> >>> another candidate for where to keep it. Duo suggests keeping the
>>> semantic
>>> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> >>> redo
>>> >>> those APIs to skip normal processing based on a return value or
>>> exception
>>> >>> but otherwise drop bypass from all the others. It will clean up
>>> areas of
>>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>>> this
>>> >>> arcane hook in compaction? Or [insert some deep hook here]? The
>>> answer
>>> >>> would be: only RPC hooks will early out, and only if you return this
>>> >>> value,
>>> >>> or throw that exception.
>>> >>>
>>> >>>
>>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>> >>>
>>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>>> when
>>> >>>> user does a Get returning instead the result of its own (Flow) Scan
>>> >>>>
>>> >>> result.
>>> >>>
>>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping
>>> Tags
>>> >>>> internally.
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
>>> apurtell@apache.org>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Rather than continue to support a weird bypass() which works in some
>>> >>>>>
>>> >>>> places
>>> >>>>
>>> >>>>> and not in others, perhaps we can substitute it with an exception?
>>> So
>>> >>>>>
>>> >>>> if
>>> >>>
>>> >>>> the coprocessor throws this exception in the pre hook then where it
>>> is
>>> >>>>> allowed we catch it and do the right thing, and where it is not
>>> allowed
>>> >>>>>
>>> >>>> we
>>> >>>>
>>> >>>>> don't catch it and the server aborts. This will at least improve
>>> the
>>> >>>>>
>>> >>>> silent
>>> >>>>
>>> >>>>> bypass() failure problem. I also don't like, in retrospect, that
>>> >>>>>
>>> >>>> calling
>>> >>>
>>> >>>> this environment method has magic side effects. Everyone understands
>>> >>>>>
>>> >>>> how
>>> >>>
>>> >>>> exceptions work, so it will be clearer.
>>> >>>>>
>>> >>>>>
>>> >>>>> We could do that though throw and catch of exceptions would be
>>> costly.
>>> >>>>
>>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>> >>>> preXXX
>>> >>>> in a few select methods returning a boolean on whether bypass? Would
>>> >>>> that
>>> >>>> work? (Would have to figure metrics still).
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> In any case we should try to address the Tephra and Phoenix cases
>>> >>>>>
>>> >>>> brought
>>> >>>
>>> >>>> up in this discussion. They look like we can find alternatives.
>>> Shall I
>>> >>>>> file JIRAs to follow up?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
>>> to
>>> >>>> use
>>> >>>> its long encoding writing Increments. Not sure how we'd do that,
>>> >>>> selectively.
>>> >>>>
>>> >>>> St.Ack
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
>>> palomino219@gmail.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> These examples are great.
>>> >>>>>>
>>> >>>>>> And I think for normal region operations such as get, put, delete,
>>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation
>>> after
>>> >>>>>>
>>> >>>>> preXXX
>>> >>>>>
>>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>>> maybe
>>> >>>>>>
>>> >>>>> just
>>> >>>>>
>>> >>>>>> let these preXXX methods return a boolean is enough to tell the
>>> HBase
>>> >>>>>> framework that we have already done the real operation so just
>>> give
>>> >>>>>>
>>> >>>>> up
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> return?
>>> >>>>>>
>>> >>>>>> Thanks.
>>> >>>>>>
>>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>> >>>>>>
>>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preDelete()
>>> >>>>
>>> >>>>> to
>>> >>>>>>
>>> >>>>>>> override handling of delete tombstones in a transactional way:
>>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>> >>>>>>>
>>> >>>>>> hbase/coprocessor/
>>> >>>>>>
>>> >>>>>>> TransactionProcessor.java#L244
>>> >>>>>>>
>>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>> >>>>>>>
>>> >>>>>> preGetOp()
>>> >>>
>>> >>>> and
>>> >>>>>
>>> >>>>>> preIncrementAfterRRowLock() to provide a transaction
>>> implementation
>>> >>>>>>>
>>> >>>>>> of
>>> >>>>
>>> >>>>> readless increments:
>>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>> >>>>>>> hbase11/IncrementHandler.java#L121
>>> >>>>>>>
>>> >>>>>>> What would be the alternate approach for these applications?  In
>>> >>>>>>>
>>> >>>>>> both
>>> >>>
>>> >>>> cases
>>> >>>>>>
>>> >>>>>>> they need to impose their own semantics on the underlying
>>> KeyValue
>>> >>>>>>> storage.  Is there a different way this can be done?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
>>> anoop.hbase@gmail.com
>>> >>>>>>>
>>> >>>>>>
>>> >>>> wrote:
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Wrap core scanners is different right?  That can be done in post
>>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>> >>>>>>>>
>>> >>>>>>> abt
>>> >>>
>>> >>>> the pre hooks where we have not yet created the core object (like
>>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
>>> >>>>>>>>
>>> >>>>>>> creation
>>> >>>
>>> >>>> and so the core code is been bypassed.    Well the wrapping thing
>>> >>>>>>>>
>>> >>>>>>> can
>>> >>>>
>>> >>>>> be done in pre hook also. First create the core object by CP code
>>> >>>>>>>> itself and then do the wrapped object and return.. I have seen
>>> in
>>> >>>>>>>>
>>> >>>>>>> one
>>> >>>>
>>> >>>>> jira issue where the usage was this way..   The wrapping can be
>>> >>>>>>>>
>>> >>>>>>> done
>>> >>>>
>>> >>>>> in post also in such cases I believe.
>>> >>>>>>>>
>>> >>>>>>>> -Anoop-
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>> >>>>>>>>
>>> >>>>>>> apurtell@apache.org>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I think we should continue to support overriding function by
>>> >>>>>>>>>
>>> >>>>>>>> object
>>> >>>>
>>> >>>>> inheritance. I didn't mention this and am not proposing more
>>> >>>>>>>>>
>>> >>>>>>>> than
>>> >>>
>>> >>>> removing
>>> >>>>>>>>
>>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>> >>>>>>>>>
>>> >>>>>>>> depends
>>> >>>
>>> >>>> on
>>> >>>>>
>>> >>>>>> being
>>> >>>>>>>>
>>> >>>>>>>>> able to wrap core scanners and return the wrappers.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>> >>>>>>>>>
>>> >>>>>>>> anoop.hbase@gmail.com>
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> When we say bypass the core code, it can be done today not
>>> >>>>>>>>>>
>>> >>>>>>>>> only
>>> >>>
>>> >>>> by
>>> >>>>
>>> >>>>> calling bypass but by returning a not null object for some of
>>> >>>>>>>>>>
>>> >>>>>>>>> the
>>> >>>>
>>> >>>>> pre
>>> >>>>>>
>>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>> >>>>>>>>>>
>>> >>>>>>>>> we
>>> >>>
>>> >>>> will
>>> >>>>>
>>> >>>>>> avoid the remaining core code execution for creation of the
>>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>> >>>>>>>>>>
>>> >>>>>>>>> remove
>>> >>>>
>>> >>>>> any
>>> >>>>>>
>>> >>>>>>> possible way of bypassing the core code by the CP hook code
>>> >>>>>>>>>>
>>> >>>>>>>>> execution
>>> >>>>>>
>>> >>>>>>> ?   Am +1.
>>> >>>>>>>>>>
>>> >>>>>>>>>> -Anoop-
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>> >>>>>>>>>>
>>> >>>>>>>>> apurtell@apache.org
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> The coprocessor API provides an environment method,
>>> >>>>>>>>>>>
>>> >>>>>>>>>> bypass(),
>>> >>>
>>> >>>> that
>>> >>>>>
>>> >>>>>> when
>>> >>>>>>>>
>>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
>>> >>>>>>>>>>>
>>> >>>>>>>>>> all
>>> >>>
>>> >>>> remaining
>>> >>>>>>>>
>>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Since
>>> >>>>
>>> >>>>> this
>>> >>>>>>
>>> >>>>>>> time I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think we are more enlightened about the complications of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> this
>>> >>>
>>> >>>> feature.
>>> >>>>>>>
>>> >>>>>>>> (Or,
>>> >>>>>>>>>>
>>> >>>>>>>>>>> anyway, speaking for myself:)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>> >>>>>>>>>>>
>>> >>>>>>>>>> case
>>> >>>>>
>>> >>>>>> the
>>> >>>>>>>
>>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>> >>>>>>>>>>>
>>> >>>>>>>>>> call
>>> >>>>
>>> >>>>> bypass()
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>> >>>>>>>>>>>
>>> >>>>>>>>>> lead
>>> >>>
>>> >>>> to a
>>> >>>>>
>>> >>>>>> poor
>>> >>>>>>>>
>>> >>>>>>>>> developer experience.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> core
>>> >>>>>>
>>> >>>>>>> code
>>> >>>>>>>>
>>> >>>>>>>>> implementing the remainder of the operation. In order to
>>> >>>>>>>>>>>
>>> >>>>>>>>>> understand
>>> >>>>>>
>>> >>>>>>> what
>>> >>>>>>>>
>>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>> >>>>>>>>>>>
>>> >>>>>>>>>> read
>>> >>>>>
>>> >>>>>> and
>>> >>>>>>>
>>> >>>>>>>> understand all of the remaining code and its nuances.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Although I
>>> >>>>
>>> >>>>> think
>>> >>>>>>>
>>> >>>>>>>> this
>>> >>>>>>>>>>
>>> >>>>>>>>>>> is good practice for coprocessor developers in general, it
>>> >>>>>>>>>>>
>>> >>>>>>>>>> demands a
>>> >>>>>>
>>> >>>>>>> lot. I
>>> >>>>>>>>>>
>>> >>>>>>>>>>> think it would provide a much better developer experience if
>>> >>>>>>>>>>>
>>> >>>>>>>>>> we
>>> >>>>
>>> >>>>> didn't
>>> >>>>>>>
>>> >>>>>>>> allow bypass, even though it means - in theory - a
>>> >>>>>>>>>>>
>>> >>>>>>>>>> coprocessor
>>> >>>
>>> >>>> would
>>> >>>>>>
>>> >>>>>>> be a
>>> >>>>>>>>
>>> >>>>>>>>> lot more limited in some ways than before. What is skipped
>>> >>>>>>>>>>>
>>> >>>>>>>>>> is
>>> >>>
>>> >>>> extremely
>>> >>>>>>>>
>>> >>>>>>>>> version dependent. That core code will vary, perhaps
>>> >>>>>>>>>>>
>>> >>>>>>>>>> significantly,
>>> >>>>>>
>>> >>>>>>> even
>>> >>>>>>>>
>>> >>>>>>>>> between point releases. We do not provide the promise of
>>> >>>>>>>>>>>
>>> >>>>>>>>>> consistent
>>> >>>>>>
>>> >>>>>>> behavior even between point releases for the bypass
>>> >>>>>>>>>>>
>>> >>>>>>>>>> semantic.
>>> >>>
>>> >>>> To
>>> >>>>
>>> >>>>> achieve
>>> >>>>>>>>
>>> >>>>>>>>> that we could not change any code between hook points.
>>> >>>>>>>>>>>
>>> >>>>>>>>>> Therefore
>>> >>>>
>>> >>>>> the
>>> >>>>>>
>>> >>>>>>> coprocessor implementer becomes an HBase core developer in
>>> >>>>>>>>>>>
>>> >>>>>>>>>> practice
>>> >>>>>>
>>> >>>>>>> as
>>> >>>>>>>
>>> >>>>>>>> soon
>>> >>>>>>>>>>
>>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>> >>>>>>>>>>>
>>> >>>>>>>>>> the
>>> >>>
>>> >>>> assumption
>>> >>>>>>>>
>>> >>>>>>>>> that the replacement for the bypassed code takes care of all
>>> >>>>>>>>>>>
>>> >>>>>>>>>> necessary
>>> >>>>>>>
>>> >>>>>>>> skipped concerns. Because those concerns can change at any
>>> >>>>>>>>>>>
>>> >>>>>>>>>> point,
>>> >>>>>
>>> >>>>>> such an
>>> >>>>>>>>
>>> >>>>>>>>> assumption is never safe.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>> >>>>>>>>>>>
>>> >>>>>>>>>> relying
>>> >>>>>>
>>> >>>>>>> on
>>> >>>>>>>
>>> >>>>>>>> the
>>> >>>>>>>>>>
>>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>> >>>>>>>>>>>
>>> >>>>>>>>>> might
>>> >>>>
>>> >>>>> use
>>> >>>>>>
>>> >>>>>>> it
>>> >>>>>>>>
>>> >>>>>>>>> in
>>> >>>>>>>>>>
>>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
>>> >>>>>>>>>>>
>>> >>>>>>>>>> operation,
>>> >>>>
>>> >>>>> by
>>> >>>>>>
>>> >>>>>>> substituting one for the other, but if so that objective
>>> >>>>>>>>>>>
>>> >>>>>>>>>> could
>>> >>>
>>> >>>> be
>>> >>>>>
>>> >>>>>> reimplemented using their new locking manager.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>
>>> >>>
>>> >>
>>>
>>
>>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

I made a start on HBASE-18770. It has edit of RegionObserver which denotes
methods that support bypass (Unfortunately, because of the varied
signatures, how bypass is signaled varies too). Would appreciate a
once-over.

of note, a CP cannot bypass flush. Speak up if you think otherwise (or you
can think of a case where this needed). My rationale is CPs won't have
enough insider knowledge to do memory accounting in a world of in-memory
compactions, and on/offheap memory in our hosting process. What ye reckon?

Coprocessors have always been able to adjust what gets compacted in any run
and even skirt compaction altogether by returning an empty set of files to
compact. This works as it ever did.

Thanks,
S



On Tue, Oct 17, 2017 at 9:46 PM, Stack <st...@duboce.net> wrote:

> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
> was going bad for a good while but lots of contributors and good discussion
> and now I think we have it). Shouldn't be too much longer.
>
> Its CP API so I was figuring it an alpha-4 item.
>
> St.Ack
>
> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>
>> May still a week or two before alpha4 I think. The scan injection, and
>> flush/compaction trigger/track API is still unstable...
>>
>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>>
>> > (catching up here)
>> >
>> > I'm glad to see you fine folks came to a conclusion around a
>> reduced-scope
>> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>> for
>> > preXXX methods, and we'd remove it for the other methods? What exactly
>> the
>> > "bypass API" would be is up in the air, correct?
>> >
>> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
>> > discussion appears to have died down?
>> >
>> > I was originally lamenting yet another big, sweeping change to CPs when
>> I
>> > had expected alpha-4 to have already landed. But, let me play devil's
>> > advocate: is this something we still think is critical to do in
>> alpha-4? I
>> > can respect wanting to get address all of these smells, but I'd be
>> worry it
>> > delays us further.
>> >
>> >
>> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>> >
>> >> Creating an exception is expensive so if it is not suggested to do it
>> in a
>> >> normal case. A common trick is to create a global exception instance,
>> and
>> >> always throw it to avoid creating every time but I think it is more
>> >> friendly to just use a return value?
>> >>
>> >> And for me, the bypass after preXXX for normal region operations just
>> >> equals to a 'cancel', which is very clear and easy to understand, so I
>> >> think it is OK to add bypass support for them. And also for compaction
>> and
>> >> flush, it is OK to give CP users the ability to cancel the operation as
>> >> the
>> >> semantic is clear, although I'm not sure how CP users would use this
>> >> feature.
>> >>
>> >> In general, I think we can provide bypass/cancel support in preXXX
>> methods
>> >> where it is the very beginning of an operation.
>> >>
>> >> Thanks.
>> >>
>> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>> >>
>> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>> use
>> >>>>
>> >>> its long encoding writing Increments. Not sure how we'd do that,
>> >>> selectively.
>> >>>
>> >>> If we can handle the rest of the trouble that you observed:
>> >>>
>> >>> 1) Lack of recognition and identification of when the key value to
>> >>> increment doesn't exist
>> >>> 2) Lack of the ability to set the timestamp of the updated key value.
>> >>>
>> >>> then they might be able to make it work. Perhaps a conversion from
>> HBase
>> >>> native to Phoenix LONG encoding when processing results, in the
>> wrapping
>> >>> scanner, informed by schema metadata.
>> >>>
>> >>> Or if we are keeping the bypass semantic in select places but
>> >>> implementing
>> >>> it with something other than today's bypass() API (please) this would
>> be
>> >>> another candidate for where to keep it. Duo suggests keeping the
>> semantic
>> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
>> >>> redo
>> >>> those APIs to skip normal processing based on a return value or
>> exception
>> >>> but otherwise drop bypass from all the others. It will clean up areas
>> of
>> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>> this
>> >>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>> >>> would be: only RPC hooks will early out, and only if you return this
>> >>> value,
>> >>> or throw that exception.
>> >>>
>> >>>
>> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>> >>>
>> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>> when
>> >>>> user does a Get returning instead the result of its own (Flow) Scan
>> >>>>
>> >>> result.
>> >>>
>> >>>> Not sure how we'd do alternative here; Timeline Server is keeping
>> Tags
>> >>>> internally.
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <
>> apurtell@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>> Rather than continue to support a weird bypass() which works in some
>> >>>>>
>> >>>> places
>> >>>>
>> >>>>> and not in others, perhaps we can substitute it with an exception?
>> So
>> >>>>>
>> >>>> if
>> >>>
>> >>>> the coprocessor throws this exception in the pre hook then where it
>> is
>> >>>>> allowed we catch it and do the right thing, and where it is not
>> allowed
>> >>>>>
>> >>>> we
>> >>>>
>> >>>>> don't catch it and the server aborts. This will at least improve the
>> >>>>>
>> >>>> silent
>> >>>>
>> >>>>> bypass() failure problem. I also don't like, in retrospect, that
>> >>>>>
>> >>>> calling
>> >>>
>> >>>> this environment method has magic side effects. Everyone understands
>> >>>>>
>> >>>> how
>> >>>
>> >>>> exceptions work, so it will be clearer.
>> >>>>>
>> >>>>>
>> >>>>> We could do that though throw and catch of exceptions would be
>> costly.
>> >>>>
>> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>> >>>> preXXX
>> >>>> in a few select methods returning a boolean on whether bypass? Would
>> >>>> that
>> >>>> work? (Would have to figure metrics still).
>> >>>>
>> >>>>
>> >>>>
>> >>>> In any case we should try to address the Tephra and Phoenix cases
>> >>>>>
>> >>>> brought
>> >>>
>> >>>> up in this discussion. They look like we can find alternatives.
>> Shall I
>> >>>>> file JIRAs to follow up?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
>> to
>> >>>> use
>> >>>> its long encoding writing Increments. Not sure how we'd do that,
>> >>>> selectively.
>> >>>>
>> >>>> St.Ack
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <
>> palomino219@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> These examples are great.
>> >>>>>>
>> >>>>>> And I think for normal region operations such as get, put, delete,
>> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>> >>>>>>
>> >>>>> preXXX
>> >>>>>
>> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>> maybe
>> >>>>>>
>> >>>>> just
>> >>>>>
>> >>>>>> let these preXXX methods return a boolean is enough to tell the
>> HBase
>> >>>>>> framework that we have already done the real operation so just give
>> >>>>>>
>> >>>>> up
>> >>>
>> >>>> and
>> >>>>>
>> >>>>>> return?
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>>
>> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>> >>>>>>
>> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>> >>>>>>>
>> >>>>>> preDelete()
>> >>>>
>> >>>>> to
>> >>>>>>
>> >>>>>>> override handling of delete tombstones in a transactional way:
>> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>> >>>>>>>
>> >>>>>> hbase/coprocessor/
>> >>>>>>
>> >>>>>>> TransactionProcessor.java#L244
>> >>>>>>>
>> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>> >>>>>>>
>> >>>>>> preGetOp()
>> >>>
>> >>>> and
>> >>>>>
>> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>> >>>>>>>
>> >>>>>> of
>> >>>>
>> >>>>> readless increments:
>> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>> >>>>>>> hbase11/IncrementHandler.java#L121
>> >>>>>>>
>> >>>>>>> What would be the alternate approach for these applications?  In
>> >>>>>>>
>> >>>>>> both
>> >>>
>> >>>> cases
>> >>>>>>
>> >>>>>>> they need to impose their own semantics on the underlying KeyValue
>> >>>>>>> storage.  Is there a different way this can be done?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <
>> anoop.hbase@gmail.com
>> >>>>>>>
>> >>>>>>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>>
>> >>>>>>> Wrap core scanners is different right?  That can be done in post
>> >>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>> >>>>>>>>
>> >>>>>>> abt
>> >>>
>> >>>> the pre hooks where we have not yet created the core object (like
>> >>>>>>>> scanner).  The CP pre code itself doing the work of object
>> >>>>>>>>
>> >>>>>>> creation
>> >>>
>> >>>> and so the core code is been bypassed.    Well the wrapping thing
>> >>>>>>>>
>> >>>>>>> can
>> >>>>
>> >>>>> be done in pre hook also. First create the core object by CP code
>> >>>>>>>> itself and then do the wrapped object and return.. I have seen in
>> >>>>>>>>
>> >>>>>>> one
>> >>>>
>> >>>>> jira issue where the usage was this way..   The wrapping can be
>> >>>>>>>>
>> >>>>>>> done
>> >>>>
>> >>>>> in post also in such cases I believe.
>> >>>>>>>>
>> >>>>>>>> -Anoop-
>> >>>>>>>>
>> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>> >>>>>>>>
>> >>>>>>> apurtell@apache.org>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I think we should continue to support overriding function by
>> >>>>>>>>>
>> >>>>>>>> object
>> >>>>
>> >>>>> inheritance. I didn't mention this and am not proposing more
>> >>>>>>>>>
>> >>>>>>>> than
>> >>>
>> >>>> removing
>> >>>>>>>>
>> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>> >>>>>>>>>
>> >>>>>>>> depends
>> >>>
>> >>>> on
>> >>>>>
>> >>>>>> being
>> >>>>>>>>
>> >>>>>>>>> able to wrap core scanners and return the wrappers.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>> >>>>>>>>>
>> >>>>>>>> anoop.hbase@gmail.com>
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> When we say bypass the core code, it can be done today not
>> >>>>>>>>>>
>> >>>>>>>>> only
>> >>>
>> >>>> by
>> >>>>
>> >>>>> calling bypass but by returning a not null object for some of
>> >>>>>>>>>>
>> >>>>>>>>> the
>> >>>>
>> >>>>> pre
>> >>>>>>
>> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>> >>>>>>>>>>
>> >>>>>>>>> we
>> >>>
>> >>>> will
>> >>>>>
>> >>>>>> avoid the remaining core code execution for creation of the
>> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>> >>>>>>>>>>
>> >>>>>>>>> remove
>> >>>>
>> >>>>> any
>> >>>>>>
>> >>>>>>> possible way of bypassing the core code by the CP hook code
>> >>>>>>>>>>
>> >>>>>>>>> execution
>> >>>>>>
>> >>>>>>> ?   Am +1.
>> >>>>>>>>>>
>> >>>>>>>>>> -Anoop-
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>> >>>>>>>>>>
>> >>>>>>>>> apurtell@apache.org
>> >>>>>>
>> >>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> The coprocessor API provides an environment method,
>> >>>>>>>>>>>
>> >>>>>>>>>> bypass(),
>> >>>
>> >>>> that
>> >>>>>
>> >>>>>> when
>> >>>>>>>>
>> >>>>>>>>> called from a preXXX hook will cause the core code to skip
>> >>>>>>>>>>>
>> >>>>>>>>>> all
>> >>>
>> >>>> remaining
>> >>>>>>>>
>> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
>> >>>>>>>>>>>
>> >>>>>>>>>> Since
>> >>>>
>> >>>>> this
>> >>>>>>
>> >>>>>>> time I
>> >>>>>>>>>>
>> >>>>>>>>>>> think we are more enlightened about the complications of
>> >>>>>>>>>>>
>> >>>>>>>>>> this
>> >>>
>> >>>> feature.
>> >>>>>>>
>> >>>>>>>> (Or,
>> >>>>>>>>>>
>> >>>>>>>>>>> anyway, speaking for myself:)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>> >>>>>>>>>>>
>> >>>>>>>>>> case
>> >>>>>
>> >>>>>> the
>> >>>>>>>
>> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
>> >>>>>>>>>>>
>> >>>>>>>>>> call
>> >>>>
>> >>>>> bypass()
>> >>>>>>>>
>> >>>>>>>>> in
>> >>>>>>>>>>
>> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>> >>>>>>>>>>>
>> >>>>>>>>>> lead
>> >>>
>> >>>> to a
>> >>>>>
>> >>>>>> poor
>> >>>>>>>>
>> >>>>>>>>> developer experience.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>> >>>>>>>>>>>
>> >>>>>>>>>> the
>> >>>
>> >>>> core
>> >>>>>>
>> >>>>>>> code
>> >>>>>>>>
>> >>>>>>>>> implementing the remainder of the operation. In order to
>> >>>>>>>>>>>
>> >>>>>>>>>> understand
>> >>>>>>
>> >>>>>>> what
>> >>>>>>>>
>> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>> >>>>>>>>>>>
>> >>>>>>>>>> read
>> >>>>>
>> >>>>>> and
>> >>>>>>>
>> >>>>>>>> understand all of the remaining code and its nuances.
>> >>>>>>>>>>>
>> >>>>>>>>>> Although I
>> >>>>
>> >>>>> think
>> >>>>>>>
>> >>>>>>>> this
>> >>>>>>>>>>
>> >>>>>>>>>>> is good practice for coprocessor developers in general, it
>> >>>>>>>>>>>
>> >>>>>>>>>> demands a
>> >>>>>>
>> >>>>>>> lot. I
>> >>>>>>>>>>
>> >>>>>>>>>>> think it would provide a much better developer experience if
>> >>>>>>>>>>>
>> >>>>>>>>>> we
>> >>>>
>> >>>>> didn't
>> >>>>>>>
>> >>>>>>>> allow bypass, even though it means - in theory - a
>> >>>>>>>>>>>
>> >>>>>>>>>> coprocessor
>> >>>
>> >>>> would
>> >>>>>>
>> >>>>>>> be a
>> >>>>>>>>
>> >>>>>>>>> lot more limited in some ways than before. What is skipped
>> >>>>>>>>>>>
>> >>>>>>>>>> is
>> >>>
>> >>>> extremely
>> >>>>>>>>
>> >>>>>>>>> version dependent. That core code will vary, perhaps
>> >>>>>>>>>>>
>> >>>>>>>>>> significantly,
>> >>>>>>
>> >>>>>>> even
>> >>>>>>>>
>> >>>>>>>>> between point releases. We do not provide the promise of
>> >>>>>>>>>>>
>> >>>>>>>>>> consistent
>> >>>>>>
>> >>>>>>> behavior even between point releases for the bypass
>> >>>>>>>>>>>
>> >>>>>>>>>> semantic.
>> >>>
>> >>>> To
>> >>>>
>> >>>>> achieve
>> >>>>>>>>
>> >>>>>>>>> that we could not change any code between hook points.
>> >>>>>>>>>>>
>> >>>>>>>>>> Therefore
>> >>>>
>> >>>>> the
>> >>>>>>
>> >>>>>>> coprocessor implementer becomes an HBase core developer in
>> >>>>>>>>>>>
>> >>>>>>>>>> practice
>> >>>>>>
>> >>>>>>> as
>> >>>>>>>
>> >>>>>>>> soon
>> >>>>>>>>>>
>> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>> >>>>>>>>>>>
>> >>>>>>>>>> the
>> >>>
>> >>>> assumption
>> >>>>>>>>
>> >>>>>>>>> that the replacement for the bypassed code takes care of all
>> >>>>>>>>>>>
>> >>>>>>>>>> necessary
>> >>>>>>>
>> >>>>>>>> skipped concerns. Because those concerns can change at any
>> >>>>>>>>>>>
>> >>>>>>>>>> point,
>> >>>>>
>> >>>>>> such an
>> >>>>>>>>
>> >>>>>>>>> assumption is never safe.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>> >>>>>>>>>>>
>> >>>>>>>>>> relying
>> >>>>>>
>> >>>>>>> on
>> >>>>>>>
>> >>>>>>>> the
>> >>>>>>>>>>
>> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>> >>>>>>>>>>>
>> >>>>>>>>>> might
>> >>>>
>> >>>>> use
>> >>>>>>
>> >>>>>>> it
>> >>>>>>>>
>> >>>>>>>>> in
>> >>>>>>>>>>
>> >>>>>>>>>>> one place to promote a normal mutation into an atomic
>> >>>>>>>>>>>
>> >>>>>>>>>> operation,
>> >>>>
>> >>>>> by
>> >>>>>>
>> >>>>>>> substituting one for the other, but if so that objective
>> >>>>>>>>>>>
>> >>>>>>>>>> could
>> >>>
>> >>>> be
>> >>>>>
>> >>>>>> reimplemented using their new locking manager.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>
>> >>>
>> >>
>>
>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Josh Elser <el...@apache.org>.

Thanks all!

On 10/18/17 12:46 AM, Stack wrote:
> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
> was going bad for a good while but lots of contributors and good discussion
> and now I think we have it). Shouldn't be too much longer.
> 
> Its CP API so I was figuring it an alpha-4 item.
> 
> St.Ack
> 
> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
> 
>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>
>> May still a week or two before alpha4 I think. The scan injection, and
>> flush/compaction trigger/track API is still unstable...
>>
>> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>>
>>> (catching up here)
>>>
>>> I'm glad to see you fine folks came to a conclusion around a
>> reduced-scope
>>> solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>> for
>>> preXXX methods, and we'd remove it for the other methods? What exactly
>> the
>>> "bypass API" would be is up in the air, correct?
>>>
>>> Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> discussion appears to have died down?
>>>
>>> I was originally lamenting yet another big, sweeping change to CPs when I
>>> had expected alpha-4 to have already landed. But, let me play devil's
>>> advocate: is this something we still think is critical to do in alpha-4?
>> I
>>> can respect wanting to get address all of these smells, but I'd be worry
>> it
>>> delays us further.
>>>
>>>
>>> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>>
>>>> Creating an exception is expensive so if it is not suggested to do it
>> in a
>>>> normal case. A common trick is to create a global exception instance,
>> and
>>>> always throw it to avoid creating every time but I think it is more
>>>> friendly to just use a return value?
>>>>
>>>> And for me, the bypass after preXXX for normal region operations just
>>>> equals to a 'cancel', which is very clear and easy to understand, so I
>>>> think it is OK to add bypass support for them. And also for compaction
>> and
>>>> flush, it is OK to give CP users the ability to cancel the operation as
>>>> the
>>>> semantic is clear, although I'm not sure how CP users would use this
>>>> feature.
>>>>
>>>> In general, I think we can provide bypass/cancel support in preXXX
>> methods
>>>> where it is the very beginning of an operation.
>>>>
>>>> Thanks.
>>>>
>>>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>>>
>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>> use
>>>>>>
>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>> selectively.
>>>>>
>>>>> If we can handle the rest of the trouble that you observed:
>>>>>
>>>>> 1) Lack of recognition and identification of when the key value to
>>>>> increment doesn't exist
>>>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>>>
>>>>> then they might be able to make it work. Perhaps a conversion from
>> HBase
>>>>> native to Phoenix LONG encoding when processing results, in the
>> wrapping
>>>>> scanner, informed by schema metadata.
>>>>>
>>>>> Or if we are keeping the bypass semantic in select places but
>>>>> implementing
>>>>> it with something other than today's bypass() API (please) this would
>> be
>>>>> another candidate for where to keep it. Duo suggests keeping the
>> semantic
>>>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>>>> redo
>>>>> those APIs to skip normal processing based on a return value or
>> exception
>>>>> but otherwise drop bypass from all the others. It will clean up areas
>> of
>>>>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>> this
>>>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>>>> would be: only RPC hooks will early out, and only if you return this
>>>>> value,
>>>>> or throw that exception.
>>>>>
>>>>>
>>>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>>>>
>>>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>> when
>>>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>>>
>>>>> result.
>>>>>
>>>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>>>> internally.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <apurtell@apache.org
>>>
>>>>>> wrote:
>>>>>>
>>>>>> Rather than continue to support a weird bypass() which works in some
>>>>>>>
>>>>>> places
>>>>>>
>>>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>>>
>>>>>> if
>>>>>
>>>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>>>> allowed we catch it and do the right thing, and where it is not
>> allowed
>>>>>>>
>>>>>> we
>>>>>>
>>>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>>>
>>>>>> silent
>>>>>>
>>>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>>>
>>>>>> calling
>>>>>
>>>>>> this environment method has magic side effects. Everyone understands
>>>>>>>
>>>>>> how
>>>>>
>>>>>> exceptions work, so it will be clearer.
>>>>>>>
>>>>>>>
>>>>>>> We could do that though throw and catch of exceptions would be
>> costly.
>>>>>>
>>>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>>>> preXXX
>>>>>> in a few select methods returning a boolean on whether bypass? Would
>>>>>> that
>>>>>> work? (Would have to figure metrics still).
>>>>>>
>>>>>>
>>>>>>
>>>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>>>
>>>>>> brought
>>>>>
>>>>>> up in this discussion. They look like we can find alternatives. Shall
>> I
>>>>>>> file JIRAs to follow up?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>>>> use
>>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>>> selectively.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <palomino219@gmail.com
>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> These examples are great.
>>>>>>>>
>>>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>>>
>>>>>>> preXXX
>>>>>>>
>>>>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>> maybe
>>>>>>>>
>>>>>>> just
>>>>>>>
>>>>>>>> let these preXXX methods return a boolean is enough to tell the
>> HBase
>>>>>>>> framework that we have already done the real operation so just give
>>>>>>>>
>>>>>>> up
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> return?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>>>>
>>>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>>>
>>>>>>>> preDelete()
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>>>
>>>>>>>> hbase/coprocessor/
>>>>>>>>
>>>>>>>>> TransactionProcessor.java#L244
>>>>>>>>>
>>>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>>>
>>>>>>>> preGetOp()
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>>>
>>>>>>>> of
>>>>>>
>>>>>>> readless increments:
>>>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>>>
>>>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>>>
>>>>>>>> both
>>>>>
>>>>>> cases
>>>>>>>>
>>>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>>>
>>>>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>>>
>>>>>>>>> abt
>>>>>
>>>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>>>
>>>>>>>>> creation
>>>>>
>>>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>>>
>>>>>>>>> can
>>>>>>
>>>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>>>
>>>>>>>>> one
>>>>>>
>>>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>>>>
>>>>>>>>> done
>>>>>>
>>>>>>> in post also in such cases I believe.
>>>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>>>
>>>>>>>>> apurtell@apache.org>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>>>
>>>>>>>>>> object
>>>>>>
>>>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>>>
>>>>>>>>>> than
>>>>>
>>>>>> removing
>>>>>>>>>>
>>>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>>>
>>>>>>>>>> depends
>>>>>
>>>>>> on
>>>>>>>
>>>>>>>> being
>>>>>>>>>>
>>>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>>>
>>>>>>>>>> anoop.hbase@gmail.com>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>>>
>>>>>>>>>>> only
>>>>>
>>>>>> by
>>>>>>
>>>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>
>>>>>>> pre
>>>>>>>>
>>>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>
>>>>>> will
>>>>>>>
>>>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>>>
>>>>>>>>>>> remove
>>>>>>
>>>>>>> any
>>>>>>>>
>>>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>>>>
>>>>>>>>>>> execution
>>>>>>>>
>>>>>>>>> ?   Am +1.
>>>>>>>>>>>>
>>>>>>>>>>>> -Anoop-
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>>>
>>>>>>>>>>> apurtell@apache.org
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>>>
>>>>>>>>>>>> bypass(),
>>>>>
>>>>>> that
>>>>>>>
>>>>>>>> when
>>>>>>>>>>
>>>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>>>
>>>>>>>>>>>> all
>>>>>
>>>>>> remaining
>>>>>>>>>>
>>>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>>>
>>>>>>>>>>>> Since
>>>>>>
>>>>>>> this
>>>>>>>>
>>>>>>>>> time I
>>>>>>>>>>>>
>>>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>>>
>>>>>>>>>>>> this
>>>>>
>>>>>> feature.
>>>>>>>>>
>>>>>>>>>> (Or,
>>>>>>>>>>>>
>>>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>>>
>>>>>>>>>>>> case
>>>>>>>
>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>>>>
>>>>>>>>>>>> call
>>>>>>
>>>>>>> bypass()
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>>>
>>>>>>>>>>>> lead
>>>>>
>>>>>> to a
>>>>>>>
>>>>>>>> poor
>>>>>>>>>>
>>>>>>>>>>> developer experience.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>
>>>>>> core
>>>>>>>>
>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>>>
>>>>>>>>>>>> understand
>>>>>>>>
>>>>>>>>> what
>>>>>>>>>>
>>>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>>>
>>>>>>>>>>>> read
>>>>>>>
>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>>>
>>>>>>>>>>>> Although I
>>>>>>
>>>>>>> think
>>>>>>>>>
>>>>>>>>>> this
>>>>>>>>>>>>
>>>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>>>
>>>>>>>>>>>> demands a
>>>>>>>>
>>>>>>>>> lot. I
>>>>>>>>>>>>
>>>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>>>
>>>>>>>>>>>> we
>>>>>>
>>>>>>> didn't
>>>>>>>>>
>>>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>>>>
>>>>>>>>>>>> coprocessor
>>>>>
>>>>>> would
>>>>>>>>
>>>>>>>>> be a
>>>>>>>>>>
>>>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>>>
>>>>>>>>>>>> is
>>>>>
>>>>>> extremely
>>>>>>>>>>
>>>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>>>
>>>>>>>>>>>> significantly,
>>>>>>>>
>>>>>>>>> even
>>>>>>>>>>
>>>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>>>
>>>>>>>>>>>> consistent
>>>>>>>>
>>>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>>>
>>>>>>>>>>>> semantic.
>>>>>
>>>>>> To
>>>>>>
>>>>>>> achieve
>>>>>>>>>>
>>>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>>>
>>>>>>>>>>>> Therefore
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>>>>>
>>>>>>>>>>>> practice
>>>>>>>>
>>>>>>>>> as
>>>>>>>>>
>>>>>>>>>> soon
>>>>>>>>>>>>
>>>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>
>>>>>> assumption
>>>>>>>>>>
>>>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>>>
>>>>>>>>>>>> necessary
>>>>>>>>>
>>>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>>>>
>>>>>>>>>>>> point,
>>>>>>>
>>>>>>>> such an
>>>>>>>>>>
>>>>>>>>>>> assumption is never safe.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>>>
>>>>>>>>>>>> relying
>>>>>>>>
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>>>
>>>>>>>>>>>> might
>>>>>>
>>>>>>> use
>>>>>>>>
>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>>>
>>>>>>>>>>>> operation,
>>>>>>
>>>>>>> by
>>>>>>>>
>>>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>>>
>>>>>>>>>>>> could
>>>>>
>>>>>> be
>>>>>>>
>>>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
was going bad for a good while but lots of contributors and good discussion
and now I think we have it). Shouldn't be too much longer.

Its CP API so I was figuring it an alpha-4 item.

St.Ack

On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>
> May still a week or two before alpha4 I think. The scan injection, and
> flush/compaction trigger/track API is still unstable...
>
> 2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:
>
> > (catching up here)
> >
> > I'm glad to see you fine folks came to a conclusion around a
> reduced-scope
> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
> for
> > preXXX methods, and we'd remove it for the other methods? What exactly
> the
> > "bypass API" would be is up in the air, correct?
> >
> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
> > discussion appears to have died down?
> >
> > I was originally lamenting yet another big, sweeping change to CPs when I
> > had expected alpha-4 to have already landed. But, let me play devil's
> > advocate: is this something we still think is critical to do in alpha-4?
> I
> > can respect wanting to get address all of these smells, but I'd be worry
> it
> > delays us further.
> >
> >
> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
> >
> >> Creating an exception is expensive so if it is not suggested to do it
> in a
> >> normal case. A common trick is to create a global exception instance,
> and
> >> always throw it to avoid creating every time but I think it is more
> >> friendly to just use a return value?
> >>
> >> And for me, the bypass after preXXX for normal region operations just
> >> equals to a 'cancel', which is very clear and easy to understand, so I
> >> think it is OK to add bypass support for them. And also for compaction
> and
> >> flush, it is OK to give CP users the ability to cancel the operation as
> >> the
> >> semantic is clear, although I'm not sure how CP users would use this
> >> feature.
> >>
> >> In general, I think we can provide bypass/cancel support in preXXX
> methods
> >> where it is the very beginning of an operation.
> >>
> >> Thanks.
> >>
> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
> >>
> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
> use
> >>>>
> >>> its long encoding writing Increments. Not sure how we'd do that,
> >>> selectively.
> >>>
> >>> If we can handle the rest of the trouble that you observed:
> >>>
> >>> 1) Lack of recognition and identification of when the key value to
> >>> increment doesn't exist
> >>> 2) Lack of the ability to set the timestamp of the updated key value.
> >>>
> >>> then they might be able to make it work. Perhaps a conversion from
> HBase
> >>> native to Phoenix LONG encoding when processing results, in the
> wrapping
> >>> scanner, informed by schema metadata.
> >>>
> >>> Or if we are keeping the bypass semantic in select places but
> >>> implementing
> >>> it with something other than today's bypass() API (please) this would
> be
> >>> another candidate for where to keep it. Duo suggests keeping the
> semantic
> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
> >>> redo
> >>> those APIs to skip normal processing based on a return value or
> exception
> >>> but otherwise drop bypass from all the others. It will clean up areas
> of
> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
> this
> >>> arcane hook in compaction? Or [insert some deep hook here]? The answer
> >>> would be: only RPC hooks will early out, and only if you return this
> >>> value,
> >>> or throw that exception.
> >>>
> >>>
> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
> >>>
> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
> when
> >>>> user does a Get returning instead the result of its own (Flow) Scan
> >>>>
> >>> result.
> >>>
> >>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
> >>>> internally.
> >>>>
> >>>>
> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <apurtell@apache.org
> >
> >>>> wrote:
> >>>>
> >>>> Rather than continue to support a weird bypass() which works in some
> >>>>>
> >>>> places
> >>>>
> >>>>> and not in others, perhaps we can substitute it with an exception? So
> >>>>>
> >>>> if
> >>>
> >>>> the coprocessor throws this exception in the pre hook then where it is
> >>>>> allowed we catch it and do the right thing, and where it is not
> allowed
> >>>>>
> >>>> we
> >>>>
> >>>>> don't catch it and the server aborts. This will at least improve the
> >>>>>
> >>>> silent
> >>>>
> >>>>> bypass() failure problem. I also don't like, in retrospect, that
> >>>>>
> >>>> calling
> >>>
> >>>> this environment method has magic side effects. Everyone understands
> >>>>>
> >>>> how
> >>>
> >>>> exceptions work, so it will be clearer.
> >>>>>
> >>>>>
> >>>>> We could do that though throw and catch of exceptions would be
> costly.
> >>>>
> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
> >>>> preXXX
> >>>> in a few select methods returning a boolean on whether bypass? Would
> >>>> that
> >>>> work? (Would have to figure metrics still).
> >>>>
> >>>>
> >>>>
> >>>> In any case we should try to address the Tephra and Phoenix cases
> >>>>>
> >>>> brought
> >>>
> >>>> up in this discussion. They look like we can find alternatives. Shall
> I
> >>>>> file JIRAs to follow up?
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
> >>>> use
> >>>> its long encoding writing Increments. Not sure how we'd do that,
> >>>> selectively.
> >>>>
> >>>> St.Ack
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>> These examples are great.
> >>>>>>
> >>>>>> And I think for normal region operations such as get, put, delete,
> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
> >>>>>>
> >>>>> preXXX
> >>>>>
> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
> maybe
> >>>>>>
> >>>>> just
> >>>>>
> >>>>>> let these preXXX methods return a boolean is enough to tell the
> HBase
> >>>>>> framework that we have already done the real operation so just give
> >>>>>>
> >>>>> up
> >>>
> >>>> and
> >>>>>
> >>>>>> return?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> >>>>>>
> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
> >>>>>>>
> >>>>>> preDelete()
> >>>>
> >>>>> to
> >>>>>>
> >>>>>>> override handling of delete tombstones in a transactional way:
> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> >>>>>>>
> >>>>>> hbase/coprocessor/
> >>>>>>
> >>>>>>> TransactionProcessor.java#L244
> >>>>>>>
> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
> >>>>>>>
> >>>>>> preGetOp()
> >>>
> >>>> and
> >>>>>
> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
> >>>>>>>
> >>>>>> of
> >>>>
> >>>>> readless increments:
> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> >>>>>>> hbase11/IncrementHandler.java#L121
> >>>>>>>
> >>>>>>> What would be the alternate approach for these applications?  In
> >>>>>>>
> >>>>>> both
> >>>
> >>>> cases
> >>>>>>
> >>>>>>> they need to impose their own semantics on the underlying KeyValue
> >>>>>>> storage.  Is there a different way this can be done?
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
> >>>>>>>
> >>>>>>
> >>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Wrap core scanners is different right?  That can be done in post
> >>>>>>>> hooks.  I have seen many use cases for this..  Its the question
> >>>>>>>>
> >>>>>>> abt
> >>>
> >>>> the pre hooks where we have not yet created the core object (like
> >>>>>>>> scanner).  The CP pre code itself doing the work of object
> >>>>>>>>
> >>>>>>> creation
> >>>
> >>>> and so the core code is been bypassed.    Well the wrapping thing
> >>>>>>>>
> >>>>>>> can
> >>>>
> >>>>> be done in pre hook also. First create the core object by CP code
> >>>>>>>> itself and then do the wrapped object and return.. I have seen in
> >>>>>>>>
> >>>>>>> one
> >>>>
> >>>>> jira issue where the usage was this way..   The wrapping can be
> >>>>>>>>
> >>>>>>> done
> >>>>
> >>>>> in post also in such cases I believe.
> >>>>>>>>
> >>>>>>>> -Anoop-
> >>>>>>>>
> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> >>>>>>>>
> >>>>>>> apurtell@apache.org>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I think we should continue to support overriding function by
> >>>>>>>>>
> >>>>>>>> object
> >>>>
> >>>>> inheritance. I didn't mention this and am not proposing more
> >>>>>>>>>
> >>>>>>>> than
> >>>
> >>>> removing
> >>>>>>>>
> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
> >>>>>>>>>
> >>>>>>>> depends
> >>>
> >>>> on
> >>>>>
> >>>>>> being
> >>>>>>>>
> >>>>>>>>> able to wrap core scanners and return the wrappers.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> >>>>>>>>>
> >>>>>>>> anoop.hbase@gmail.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> When we say bypass the core code, it can be done today not
> >>>>>>>>>>
> >>>>>>>>> only
> >>>
> >>>> by
> >>>>
> >>>>> calling bypass but by returning a not null object for some of
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> pre
> >>>>>>
> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
> >>>>>>>>>>
> >>>>>>>>> we
> >>>
> >>>> will
> >>>>>
> >>>>>> avoid the remaining core code execution for creation of the
> >>>>>>>>>> scanner(s).  So this proposal include this aspect also and
> >>>>>>>>>>
> >>>>>>>>> remove
> >>>>
> >>>>> any
> >>>>>>
> >>>>>>> possible way of bypassing the core code by the CP hook code
> >>>>>>>>>>
> >>>>>>>>> execution
> >>>>>>
> >>>>>>> ?   Am +1.
> >>>>>>>>>>
> >>>>>>>>>> -Anoop-
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> >>>>>>>>>>
> >>>>>>>>> apurtell@apache.org
> >>>>>>
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> The coprocessor API provides an environment method,
> >>>>>>>>>>>
> >>>>>>>>>> bypass(),
> >>>
> >>>> that
> >>>>>
> >>>>>> when
> >>>>>>>>
> >>>>>>>>> called from a preXXX hook will cause the core code to skip
> >>>>>>>>>>>
> >>>>>>>>>> all
> >>>
> >>>> remaining
> >>>>>>>>
> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
> >>>>>>>>>>>
> >>>>>>>>>> Since
> >>>>
> >>>>> this
> >>>>>>
> >>>>>>> time I
> >>>>>>>>>>
> >>>>>>>>>>> think we are more enlightened about the complications of
> >>>>>>>>>>>
> >>>>>>>>>> this
> >>>
> >>>> feature.
> >>>>>>>
> >>>>>>>> (Or,
> >>>>>>>>>>
> >>>>>>>>>>> anyway, speaking for myself:)
> >>>>>>>>>>>
> >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
> >>>>>>>>>>>
> >>>>>>>>>> case
> >>>>>
> >>>>>> the
> >>>>>>>
> >>>>>>>> javadoc for the hook says so, but it can be missed. If you
> >>>>>>>>>>>
> >>>>>>>>>> call
> >>>>
> >>>>> bypass()
> >>>>>>>>
> >>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>>>> a hook where it is not supported it is a no-op. This can
> >>>>>>>>>>>
> >>>>>>>>>> lead
> >>>
> >>>> to a
> >>>>>
> >>>>>> poor
> >>>>>>>>
> >>>>>>>>> developer experience.
> >>>>>>>>>>>
> >>>>>>>>>>> Where bypass is supported what is being bypassed is all of
> >>>>>>>>>>>
> >>>>>>>>>> the
> >>>
> >>>> core
> >>>>>>
> >>>>>>> code
> >>>>>>>>
> >>>>>>>>> implementing the remainder of the operation. In order to
> >>>>>>>>>>>
> >>>>>>>>>> understand
> >>>>>>
> >>>>>>> what
> >>>>>>>>
> >>>>>>>>> calling bypass() will skip, a coprocessor implementer should
> >>>>>>>>>>>
> >>>>>>>>>> read
> >>>>>
> >>>>>> and
> >>>>>>>
> >>>>>>>> understand all of the remaining code and its nuances.
> >>>>>>>>>>>
> >>>>>>>>>> Although I
> >>>>
> >>>>> think
> >>>>>>>
> >>>>>>>> this
> >>>>>>>>>>
> >>>>>>>>>>> is good practice for coprocessor developers in general, it
> >>>>>>>>>>>
> >>>>>>>>>> demands a
> >>>>>>
> >>>>>>> lot. I
> >>>>>>>>>>
> >>>>>>>>>>> think it would provide a much better developer experience if
> >>>>>>>>>>>
> >>>>>>>>>> we
> >>>>
> >>>>> didn't
> >>>>>>>
> >>>>>>>> allow bypass, even though it means - in theory - a
> >>>>>>>>>>>
> >>>>>>>>>> coprocessor
> >>>
> >>>> would
> >>>>>>
> >>>>>>> be a
> >>>>>>>>
> >>>>>>>>> lot more limited in some ways than before. What is skipped
> >>>>>>>>>>>
> >>>>>>>>>> is
> >>>
> >>>> extremely
> >>>>>>>>
> >>>>>>>>> version dependent. That core code will vary, perhaps
> >>>>>>>>>>>
> >>>>>>>>>> significantly,
> >>>>>>
> >>>>>>> even
> >>>>>>>>
> >>>>>>>>> between point releases. We do not provide the promise of
> >>>>>>>>>>>
> >>>>>>>>>> consistent
> >>>>>>
> >>>>>>> behavior even between point releases for the bypass
> >>>>>>>>>>>
> >>>>>>>>>> semantic.
> >>>
> >>>> To
> >>>>
> >>>>> achieve
> >>>>>>>>
> >>>>>>>>> that we could not change any code between hook points.
> >>>>>>>>>>>
> >>>>>>>>>> Therefore
> >>>>
> >>>>> the
> >>>>>>
> >>>>>>> coprocessor implementer becomes an HBase core developer in
> >>>>>>>>>>>
> >>>>>>>>>> practice
> >>>>>>
> >>>>>>> as
> >>>>>>>
> >>>>>>>> soon
> >>>>>>>>>>
> >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
> >>>>>>>>>>>
> >>>>>>>>>> the
> >>>
> >>>> assumption
> >>>>>>>>
> >>>>>>>>> that the replacement for the bypassed code takes care of all
> >>>>>>>>>>>
> >>>>>>>>>> necessary
> >>>>>>>
> >>>>>>>> skipped concerns. Because those concerns can change at any
> >>>>>>>>>>>
> >>>>>>>>>> point,
> >>>>>
> >>>>>> such an
> >>>>>>>>
> >>>>>>>>> assumption is never safe.
> >>>>>>>>>>>
> >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
> >>>>>>>>>>>
> >>>>>>>>>> relying
> >>>>>>
> >>>>>>> on
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>>
> >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
> >>>>>>>>>>>
> >>>>>>>>>> might
> >>>>
> >>>>> use
> >>>>>>
> >>>>>>> it
> >>>>>>>>
> >>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>>>> one place to promote a normal mutation into an atomic
> >>>>>>>>>>>
> >>>>>>>>>> operation,
> >>>>
> >>>>> by
> >>>>>>
> >>>>>>> substituting one for the other, but if so that objective
> >>>>>>>>>>>
> >>>>>>>>>> could
> >>>
> >>>> be
> >>>>>
> >>>>>> reimplemented using their new locking manager.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>
> >>>
> >>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

Fine. Let me change the title of HBASE-18770 and prepare a patch there.

May still a week or two before alpha4 I think. The scan injection, and
flush/compaction trigger/track API is still unstable...

2017-10-18 6:12 GMT+08:00 Josh Elser <el...@apache.org>:

> (catching up here)
>
> I'm glad to see you fine folks came to a conclusion around a reduced-scope
> solution (correct me if I'm wrong). "Some" bypass mechanism would stay for
> preXXX methods, and we'd remove it for the other methods? What exactly the
> "bypass API" would be is up in the air, correct?
>
> Duo -- maybe you could put the "current plan" on HBASE-18770 since
> discussion appears to have died down?
>
> I was originally lamenting yet another big, sweeping change to CPs when I
> had expected alpha-4 to have already landed. But, let me play devil's
> advocate: is this something we still think is critical to do in alpha-4? I
> can respect wanting to get address all of these smells, but I'd be worry it
> delays us further.
>
>
> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>
>> Creating an exception is expensive so if it is not suggested to do it in a
>> normal case. A common trick is to create a global exception instance, and
>> always throw it to avoid creating every time but I think it is more
>> friendly to just use a return value?
>>
>> And for me, the bypass after preXXX for normal region operations just
>> equals to a 'cancel', which is very clear and easy to understand, so I
>> think it is OK to add bypass support for them. And also for compaction and
>> flush, it is OK to give CP users the ability to cancel the operation as
>> the
>> semantic is clear, although I'm not sure how CP users would use this
>> feature.
>>
>> In general, I think we can provide bypass/cancel support in preXXX methods
>> where it is the very beginning of an operation.
>>
>> Thanks.
>>
>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>
>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>>>>
>>> its long encoding writing Increments. Not sure how we'd do that,
>>> selectively.
>>>
>>> If we can handle the rest of the trouble that you observed:
>>>
>>> 1) Lack of recognition and identification of when the key value to
>>> increment doesn't exist
>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>
>>> then they might be able to make it work. Perhaps a conversion from HBase
>>> native to Phoenix LONG encoding when processing results, in the wrapping
>>> scanner, informed by schema metadata.
>>>
>>> Or if we are keeping the bypass semantic in select places but
>>> implementing
>>> it with something other than today's bypass() API (please) this would be
>>> another candidate for where to keep it. Duo suggests keeping the semantic
>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> redo
>>> those APIs to skip normal processing based on a return value or exception
>>> but otherwise drop bypass from all the others. It will clean up areas of
>>> confusion, e.g. can I bypass splits or flushes or not? Or what about this
>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>> would be: only RPC hooks will early out, and only if you return this
>>> value,
>>> or throw that exception.
>>>
>>>
>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>>
>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>
>>> result.
>>>
>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>> internally.
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>>
>>>> Rather than continue to support a weird bypass() which works in some
>>>>>
>>>> places
>>>>
>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>
>>>> if
>>>
>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>> allowed we catch it and do the right thing, and where it is not allowed
>>>>>
>>>> we
>>>>
>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>
>>>> silent
>>>>
>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>
>>>> calling
>>>
>>>> this environment method has magic side effects. Everyone understands
>>>>>
>>>> how
>>>
>>>> exceptions work, so it will be clearer.
>>>>>
>>>>>
>>>>> We could do that though throw and catch of exceptions would be costly.
>>>>
>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>> preXXX
>>>> in a few select methods returning a boolean on whether bypass? Would
>>>> that
>>>> work? (Would have to figure metrics still).
>>>>
>>>>
>>>>
>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>
>>>> brought
>>>
>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>>> file JIRAs to follow up?
>>>>>
>>>>>
>>>>>
>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>> use
>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>> selectively.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> These examples are great.
>>>>>>
>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>
>>>>> preXXX
>>>>>
>>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>>>>
>>>>> just
>>>>>
>>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>>> framework that we have already done the real operation so just give
>>>>>>
>>>>> up
>>>
>>>> and
>>>>>
>>>>>> return?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>>
>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>
>>>>>> preDelete()
>>>>
>>>>> to
>>>>>>
>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>
>>>>>> hbase/coprocessor/
>>>>>>
>>>>>>> TransactionProcessor.java#L244
>>>>>>>
>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>
>>>>>> preGetOp()
>>>
>>>> and
>>>>>
>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>
>>>>>> of
>>>>
>>>>> readless increments:
>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>
>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>
>>>>>> both
>>>
>>>> cases
>>>>>>
>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>
>>>>>>
>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>
>>>>>>> abt
>>>
>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>
>>>>>>> creation
>>>
>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>
>>>>>>> can
>>>>
>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>
>>>>>>> one
>>>>
>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>>
>>>>>>> done
>>>>
>>>>> in post also in such cases I believe.
>>>>>>>>
>>>>>>>> -Anoop-
>>>>>>>>
>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>
>>>>>>> apurtell@apache.org>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>
>>>>>>>> object
>>>>
>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>
>>>>>>>> than
>>>
>>>> removing
>>>>>>>>
>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>
>>>>>>>> depends
>>>
>>>> on
>>>>>
>>>>>> being
>>>>>>>>
>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>
>>>>>>>> anoop.hbase@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>
>>>>>>>>> only
>>>
>>>> by
>>>>
>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>
>>>>>>>>> the
>>>>
>>>>> pre
>>>>>>
>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>>
>>>>>>>>> we
>>>
>>>> will
>>>>>
>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>
>>>>>>>>> remove
>>>>
>>>>> any
>>>>>>
>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>>
>>>>>>>>> execution
>>>>>>
>>>>>>> ?   Am +1.
>>>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>
>>>>>>>>> apurtell@apache.org
>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>
>>>>>>>>>> bypass(),
>>>
>>>> that
>>>>>
>>>>>> when
>>>>>>>>
>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>
>>>>>>>>>> all
>>>
>>>> remaining
>>>>>>>>
>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>
>>>>>>>>>> Since
>>>>
>>>>> this
>>>>>>
>>>>>>> time I
>>>>>>>>>>
>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>
>>>>>>>>>> this
>>>
>>>> feature.
>>>>>>>
>>>>>>>> (Or,
>>>>>>>>>>
>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>
>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>
>>>>>>>>>> case
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>>
>>>>>>>>>> call
>>>>
>>>>> bypass()
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>
>>>>>>>>>> lead
>>>
>>>> to a
>>>>>
>>>>>> poor
>>>>>>>>
>>>>>>>>> developer experience.
>>>>>>>>>>>
>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> core
>>>>>>
>>>>>>> code
>>>>>>>>
>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>
>>>>>>>>>> understand
>>>>>>
>>>>>>> what
>>>>>>>>
>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>
>>>>>>>>>> read
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>
>>>>>>>>>> Although I
>>>>
>>>>> think
>>>>>>>
>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>
>>>>>>>>>> demands a
>>>>>>
>>>>>>> lot. I
>>>>>>>>>>
>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>
>>>>>>>>>> we
>>>>
>>>>> didn't
>>>>>>>
>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>>
>>>>>>>>>> coprocessor
>>>
>>>> would
>>>>>>
>>>>>>> be a
>>>>>>>>
>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>
>>>>>>>>>> is
>>>
>>>> extremely
>>>>>>>>
>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>
>>>>>>>>>> significantly,
>>>>>>
>>>>>>> even
>>>>>>>>
>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>
>>>>>>>>>> consistent
>>>>>>
>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>
>>>>>>>>>> semantic.
>>>
>>>> To
>>>>
>>>>> achieve
>>>>>>>>
>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>
>>>>>>>>>> Therefore
>>>>
>>>>> the
>>>>>>
>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>>>
>>>>>>>>>> practice
>>>>>>
>>>>>>> as
>>>>>>>
>>>>>>>> soon
>>>>>>>>>>
>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> assumption
>>>>>>>>
>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>
>>>>>>>>>> necessary
>>>>>>>
>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>>
>>>>>>>>>> point,
>>>>>
>>>>>> such an
>>>>>>>>
>>>>>>>>> assumption is never safe.
>>>>>>>>>>>
>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>
>>>>>>>>>> relying
>>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>
>>>>>>>>>> might
>>>>
>>>>> use
>>>>>>
>>>>>>> it
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>
>>>>>>>>>> operation,
>>>>
>>>>> by
>>>>>>
>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>
>>>>>>>>>> could
>>>
>>>> be
>>>>>
>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>
>>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

What motivated me to ask is work for alpha-4 (I think) that rips out
coprocessor access to metrics objects. I'm saying if we support bypass,
then metrics are a public facing API, and this isn't a good change to make.
However it follows to consider if how bypass works now is a good thing to
support. I claim it is not. I think we've arrived at a consensus that
"some" is still good, especially for RPC. And so we have to get changes in
before 2.0 goes GA.
I don't think it is a blocker until then. 



On Tue, Oct 17, 2017 at 3:29 PM, Josh Elser <el...@apache.org> wrote:

> That's fine, and I'm quite familiar with how that works.
>
> I was just trying to catch up with the express purpose of driving forward
> alpha-4. As a bystander, this is an unassigned blocker. I wanted to make
> sure that we both had consensus on what people think we should do as well
> as someone signing up to do that work.
>
> If we don't have both, it's my opinion that we shouldn't keep it in
> alpha-4.
>
>
> On 10/17/17 6:15 PM, Andrew Purtell wrote:
>
>> Sometimes we don't arrive at a point where discussions happen and
>> decisions
>> are made until code is about to ship. A general thing with open source, I
>> think. It is less than ideal but important to strike when that iron is
>> (eventually) hot.
>>
>>
>> On Tue, Oct 17, 2017 at 3:12 PM, Josh Elser <el...@apache.org> wrote:
>>
>> (catching up here)
>>>
>>> I'm glad to see you fine folks came to a conclusion around a
>>> reduced-scope
>>> solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>>> for
>>> preXXX methods, and we'd remove it for the other methods? What exactly
>>> the
>>> "bypass API" would be is up in the air, correct?
>>>
>>> Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> discussion appears to have died down?
>>>
>>> I was originally lamenting yet another big, sweeping change to CPs when I
>>> had expected alpha-4 to have already landed. But, let me play devil's
>>> advocate: is this something we still think is critical to do in alpha-4?
>>> I
>>> can respect wanting to get address all of these smells, but I'd be worry
>>> it
>>> delays us further.
>>>
>>>
>>> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>>
>>> Creating an exception is expensive so if it is not suggested to do it in
>>>> a
>>>> normal case. A common trick is to create a global exception instance,
>>>> and
>>>> always throw it to avoid creating every time but I think it is more
>>>> friendly to just use a return value?
>>>>
>>>> And for me, the bypass after preXXX for normal region operations just
>>>> equals to a 'cancel', which is very clear and easy to understand, so I
>>>> think it is OK to add bypass support for them. And also for compaction
>>>> and
>>>> flush, it is OK to give CP users the ability to cancel the operation as
>>>> the
>>>> semantic is clear, although I'm not sure how CP users would use this
>>>> feature.
>>>>
>>>> In general, I think we can provide bypass/cancel support in preXXX
>>>> methods
>>>> where it is the very beginning of an operation.
>>>>
>>>> Thanks.
>>>>
>>>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>>>
>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>> use
>>>>
>>>>>
>>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>> selectively.
>>>>>
>>>>> If we can handle the rest of the trouble that you observed:
>>>>>
>>>>> 1) Lack of recognition and identification of when the key value to
>>>>> increment doesn't exist
>>>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>>>
>>>>> then they might be able to make it work. Perhaps a conversion from
>>>>> HBase
>>>>> native to Phoenix LONG encoding when processing results, in the
>>>>> wrapping
>>>>> scanner, informed by schema metadata.
>>>>>
>>>>> Or if we are keeping the bypass semantic in select places but
>>>>> implementing
>>>>> it with something other than today's bypass() API (please) this would
>>>>> be
>>>>> another candidate for where to keep it. Duo suggests keeping the
>>>>> semantic
>>>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>>>> redo
>>>>> those APIs to skip normal processing based on a return value or
>>>>> exception
>>>>> but otherwise drop bypass from all the others. It will clean up areas
>>>>> of
>>>>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>>>>> this
>>>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>>>> would be: only RPC hooks will early out, and only if you return this
>>>>> value,
>>>>> or throw that exception.
>>>>>
>>>>>
>>>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>>>>
>>>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>>>>> when
>>>>>
>>>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>>>
>>>>>> result.
>>>>>
>>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>>>> internally.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <apurtell@apache.org
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>> Rather than continue to support a weird bypass() which works in some
>>>>>>
>>>>>>>
>>>>>>> places
>>>>>>
>>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>>>
>>>>>>> if
>>>>>>
>>>>>
>>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>>>
>>>>>>> allowed we catch it and do the right thing, and where it is not
>>>>>>> allowed
>>>>>>>
>>>>>>> we
>>>>>>
>>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>>>
>>>>>>> silent
>>>>>>
>>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>>>
>>>>>>> calling
>>>>>>
>>>>>
>>>>> this environment method has magic side effects. Everyone understands
>>>>>>
>>>>>>>
>>>>>>> how
>>>>>>
>>>>>
>>>>> exceptions work, so it will be clearer.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We could do that though throw and catch of exceptions would be
>>>>>>> costly.
>>>>>>>
>>>>>>
>>>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>>>> preXXX
>>>>>> in a few select methods returning a boolean on whether bypass? Would
>>>>>> that
>>>>>> work? (Would have to figure metrics still).
>>>>>>
>>>>>>
>>>>>>
>>>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>>
>>>>>>>
>>>>>>> brought
>>>>>>
>>>>>
>>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>>>>
>>>>>>> file JIRAs to follow up?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>>>>>
>>>>>> use
>>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>>> selectively.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <palomino219@gmail.com
>>>>>> >
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> These examples are great.
>>>>>>>
>>>>>>>>
>>>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>>>
>>>>>>>> preXXX
>>>>>>>
>>>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>>>>>>
>>>>>>>> just
>>>>>>>
>>>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>>>>> framework that we have already done the real operation so just give
>>>>>>>>
>>>>>>>> up
>>>>>>>
>>>>>>
>>>>> and
>>>>>>
>>>>>>>
>>>>>>> return?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>>>>
>>>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>>
>>>>>>>>>
>>>>>>>>> preDelete()
>>>>>>>>
>>>>>>>
>>>>>> to
>>>>>>>
>>>>>>>>
>>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>>>
>>>>>>>>> hbase/coprocessor/
>>>>>>>>
>>>>>>>> TransactionProcessor.java#L244
>>>>>>>>>
>>>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>>>
>>>>>>>>> preGetOp()
>>>>>>>>
>>>>>>>
>>>>> and
>>>>>>
>>>>>>>
>>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>>
>>>>>>>>>
>>>>>>>>> of
>>>>>>>>
>>>>>>>
>>>>>> readless increments:
>>>>>>>
>>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>>>
>>>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>>>
>>>>>>>>> both
>>>>>>>>
>>>>>>>
>>>>> cases
>>>>>>
>>>>>>>
>>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>>>
>>>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>>>
>>>>>>>>>> abt
>>>>>>>>>
>>>>>>>>
>>>>> the pre hooks where we have not yet created the core object (like
>>>>>>
>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>>>
>>>>>>>>>> creation
>>>>>>>>>
>>>>>>>>
>>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>
>>>>>>>
>>>>>>>>>> can
>>>>>>>>>
>>>>>>>>
>>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>
>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>>>
>>>>>>>>>> one
>>>>>>>>>
>>>>>>>>
>>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>
>>>>>>>>
>>>>>>>>>> done
>>>>>>>>>
>>>>>>>>
>>>>>> in post also in such cases I believe.
>>>>>>>
>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>>>
>>>>>>>>>> apurtell@apache.org>
>>>>>>>>>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>>>
>>>>>>>>>>> object
>>>>>>>>>>
>>>>>>>>>
>>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>
>>>>>>>>
>>>>>>>>>>> than
>>>>>>>>>>
>>>>>>>>>
>>>>> removing
>>>>>>
>>>>>>>
>>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>>>
>>>>>>>>>>> depends
>>>>>>>>>>
>>>>>>>>>
>>>>> on
>>>>>>
>>>>>>>
>>>>>>> being
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>>>
>>>>>>>>>>> anoop.hbase@gmail.com>
>>>>>>>>>>
>>>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> only
>>>>>>>>>>>
>>>>>>>>>>
>>>>> by
>>>>>>
>>>>>> calling bypass but by returning a not null object for some of
>>>>>>>
>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> pre
>>>>>>>
>>>>>>>>
>>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> we
>>>>>>>>>>>
>>>>>>>>>>
>>>>> will
>>>>>>
>>>>>>>
>>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>
>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>>>
>>>>>>>>>>>> remove
>>>>>>>>>>>
>>>>>>>>>>
>>>>>> any
>>>>>>>
>>>>>>>>
>>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> execution
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> ?   Am +1.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> -Anoop-
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>>>
>>>>>>>>>>>> apurtell@apache.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>>>
>>>>>>>>>>>>> bypass(),
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> that
>>>>>>
>>>>>>>
>>>>>>> when
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> all
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> remaining
>>>>>>
>>>>>>>
>>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Since
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> this
>>>>>>>
>>>>>>>>
>>>>>>>> time I
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>>>
>>>>>>>>>>>>> this
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> feature.
>>>>>>
>>>>>>>
>>>>>>>>> (Or,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>>>
>>>>>>>>>>>>> case
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>>
>>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> call
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> bypass()
>>>>>>>
>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>>>
>>>>>>>>>>>>> lead
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> to a
>>>>>>
>>>>>>>
>>>>>>> poor
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> developer experience.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> core
>>>>>>
>>>>>>>
>>>>>>>> code
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> understand
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> what
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> read
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> and
>>>>>>>>
>>>>>>>>>
>>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Although I
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> think
>>>>>>>
>>>>>>>>
>>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>>>
>>>>>>>>>>>>> demands a
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> lot. I
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>>>
>>>>>>>>>>>>> we
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> didn't
>>>>>>>
>>>>>>>>
>>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> coprocessor
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> would
>>>>>>
>>>>>>>
>>>>>>>> be a
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> extremely
>>>>>>
>>>>>>>
>>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> significantly,
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> even
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> consistent
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> semantic.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> To
>>>>>>
>>>>>> achieve
>>>>>>>
>>>>>>>>
>>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> the
>>>>>>>
>>>>>>>>
>>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> practice
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> as
>>>>>>>>>
>>>>>>>>> soon
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> assumption
>>>>>>
>>>>>>>
>>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> necessary
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> point,
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>> such an
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> assumption is never safe.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>>>
>>>>>>>>>>>>> relying
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>> on
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>>>
>>>>>>>>>>>>> might
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> use
>>>>>>>
>>>>>>>>
>>>>>>>> it
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>>>
>>>>>>>>>>>>> operation,
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> by
>>>>>>>
>>>>>>>>
>>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> could
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>> be
>>>>>>
>>>>>>>
>>>>>>> reimplemented using their new locking manager.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Josh Elser <el...@apache.org>.

That's fine, and I'm quite familiar with how that works.

I was just trying to catch up with the express purpose of driving 
forward alpha-4. As a bystander, this is an unassigned blocker. I wanted 
to make sure that we both had consensus on what people think we should 
do as well as someone signing up to do that work.

If we don't have both, it's my opinion that we shouldn't keep it in alpha-4.

On 10/17/17 6:15 PM, Andrew Purtell wrote:
> Sometimes we don't arrive at a point where discussions happen and decisions
> are made until code is about to ship. A general thing with open source, I
> think. It is less than ideal but important to strike when that iron is
> (eventually) hot.
> 
> 
> On Tue, Oct 17, 2017 at 3:12 PM, Josh Elser <el...@apache.org> wrote:
> 
>> (catching up here)
>>
>> I'm glad to see you fine folks came to a conclusion around a reduced-scope
>> solution (correct me if I'm wrong). "Some" bypass mechanism would stay for
>> preXXX methods, and we'd remove it for the other methods? What exactly the
>> "bypass API" would be is up in the air, correct?
>>
>> Duo -- maybe you could put the "current plan" on HBASE-18770 since
>> discussion appears to have died down?
>>
>> I was originally lamenting yet another big, sweeping change to CPs when I
>> had expected alpha-4 to have already landed. But, let me play devil's
>> advocate: is this something we still think is critical to do in alpha-4? I
>> can respect wanting to get address all of these smells, but I'd be worry it
>> delays us further.
>>
>>
>> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>
>>> Creating an exception is expensive so if it is not suggested to do it in a
>>> normal case. A common trick is to create a global exception instance, and
>>> always throw it to avoid creating every time but I think it is more
>>> friendly to just use a return value?
>>>
>>> And for me, the bypass after preXXX for normal region operations just
>>> equals to a 'cancel', which is very clear and easy to understand, so I
>>> think it is OK to add bypass support for them. And also for compaction and
>>> flush, it is OK to give CP users the ability to cancel the operation as
>>> the
>>> semantic is clear, although I'm not sure how CP users would use this
>>> feature.
>>>
>>> In general, I think we can provide bypass/cancel support in preXXX methods
>>> where it is the very beginning of an operation.
>>>
>>> Thanks.
>>>
>>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>>
>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>>>>>
>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>> selectively.
>>>>
>>>> If we can handle the rest of the trouble that you observed:
>>>>
>>>> 1) Lack of recognition and identification of when the key value to
>>>> increment doesn't exist
>>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>>
>>>> then they might be able to make it work. Perhaps a conversion from HBase
>>>> native to Phoenix LONG encoding when processing results, in the wrapping
>>>> scanner, informed by schema metadata.
>>>>
>>>> Or if we are keeping the bypass semantic in select places but
>>>> implementing
>>>> it with something other than today's bypass() API (please) this would be
>>>> another candidate for where to keep it. Duo suggests keeping the semantic
>>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>>> redo
>>>> those APIs to skip normal processing based on a return value or exception
>>>> but otherwise drop bypass from all the others. It will clean up areas of
>>>> confusion, e.g. can I bypass splits or flushes or not? Or what about this
>>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>>> would be: only RPC hooks will early out, and only if you return this
>>>> value,
>>>> or throw that exception.
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>>>
>>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
>>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>>
>>>> result.
>>>>
>>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>>> internally.
>>>>>
>>>>>
>>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Rather than continue to support a weird bypass() which works in some
>>>>>>
>>>>> places
>>>>>
>>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>>
>>>>> if
>>>>
>>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>>> allowed we catch it and do the right thing, and where it is not allowed
>>>>>>
>>>>> we
>>>>>
>>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>>
>>>>> silent
>>>>>
>>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>>
>>>>> calling
>>>>
>>>>> this environment method has magic side effects. Everyone understands
>>>>>>
>>>>> how
>>>>
>>>>> exceptions work, so it will be clearer.
>>>>>>
>>>>>>
>>>>>> We could do that though throw and catch of exceptions would be costly.
>>>>>
>>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>>> preXXX
>>>>> in a few select methods returning a boolean on whether bypass? Would
>>>>> that
>>>>> work? (Would have to figure metrics still).
>>>>>
>>>>>
>>>>>
>>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>>
>>>>> brought
>>>>
>>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>>>> file JIRAs to follow up?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>>> use
>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>> selectively.
>>>>>
>>>>> St.Ack
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> These examples are great.
>>>>>>>
>>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>>
>>>>>> preXXX
>>>>>>
>>>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>>>>>
>>>>>> just
>>>>>>
>>>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>>>> framework that we have already done the real operation so just give
>>>>>>>
>>>>>> up
>>>>
>>>>> and
>>>>>>
>>>>>>> return?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>>>
>>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>>
>>>>>>> preDelete()
>>>>>
>>>>>> to
>>>>>>>
>>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>>
>>>>>>> hbase/coprocessor/
>>>>>>>
>>>>>>>> TransactionProcessor.java#L244
>>>>>>>>
>>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>>
>>>>>>> preGetOp()
>>>>
>>>>> and
>>>>>>
>>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>>
>>>>>>> of
>>>>>
>>>>>> readless increments:
>>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>>
>>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>>
>>>>>>> both
>>>>
>>>>> cases
>>>>>>>
>>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>>
>>>>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>>
>>>>>>>> abt
>>>>
>>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>>
>>>>>>>> creation
>>>>
>>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>>
>>>>>>>> can
>>>>>
>>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>>
>>>>>>>> one
>>>>>
>>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>>>
>>>>>>>> done
>>>>>
>>>>>> in post also in such cases I believe.
>>>>>>>>>
>>>>>>>>> -Anoop-
>>>>>>>>>
>>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>>
>>>>>>>> apurtell@apache.org>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>>
>>>>>>>>> object
>>>>>
>>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>>
>>>>>>>>> than
>>>>
>>>>> removing
>>>>>>>>>
>>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>>
>>>>>>>>> depends
>>>>
>>>>> on
>>>>>>
>>>>>>> being
>>>>>>>>>
>>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>>
>>>>>>>>> anoop.hbase@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>>
>>>>>>>>>> only
>>>>
>>>>> by
>>>>>
>>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>>
>>>>>>>>>> the
>>>>>
>>>>>> pre
>>>>>>>
>>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>>>
>>>>>>>>>> we
>>>>
>>>>> will
>>>>>>
>>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>>
>>>>>>>>>> remove
>>>>>
>>>>>> any
>>>>>>>
>>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>>>
>>>>>>>>>> execution
>>>>>>>
>>>>>>>> ?   Am +1.
>>>>>>>>>>>
>>>>>>>>>>> -Anoop-
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>>
>>>>>>>>>> apurtell@apache.org
>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>>
>>>>>>>>>>> bypass(),
>>>>
>>>>> that
>>>>>>
>>>>>>> when
>>>>>>>>>
>>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>>
>>>>>>>>>>> all
>>>>
>>>>> remaining
>>>>>>>>>
>>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>>
>>>>>>>>>>> Since
>>>>>
>>>>>> this
>>>>>>>
>>>>>>>> time I
>>>>>>>>>>>
>>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>>
>>>>>>>>>>> this
>>>>
>>>>> feature.
>>>>>>>>
>>>>>>>>> (Or,
>>>>>>>>>>>
>>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>>
>>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>>
>>>>>>>>>>> case
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>>>
>>>>>>>>>>> call
>>>>>
>>>>>> bypass()
>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>>
>>>>>>>>>>> lead
>>>>
>>>>> to a
>>>>>>
>>>>>>> poor
>>>>>>>>>
>>>>>>>>>> developer experience.
>>>>>>>>>>>>
>>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>
>>>>> core
>>>>>>>
>>>>>>>> code
>>>>>>>>>
>>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>>
>>>>>>>>>>> understand
>>>>>>>
>>>>>>>> what
>>>>>>>>>
>>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>>
>>>>>>>>>>> read
>>>>>>
>>>>>>> and
>>>>>>>>
>>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>>
>>>>>>>>>>> Although I
>>>>>
>>>>>> think
>>>>>>>>
>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>>
>>>>>>>>>>> demands a
>>>>>>>
>>>>>>>> lot. I
>>>>>>>>>>>
>>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>
>>>>>> didn't
>>>>>>>>
>>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>>>
>>>>>>>>>>> coprocessor
>>>>
>>>>> would
>>>>>>>
>>>>>>>> be a
>>>>>>>>>
>>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>
>>>>> extremely
>>>>>>>>>
>>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>>
>>>>>>>>>>> significantly,
>>>>>>>
>>>>>>>> even
>>>>>>>>>
>>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>>
>>>>>>>>>>> consistent
>>>>>>>
>>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>>
>>>>>>>>>>> semantic.
>>>>
>>>>> To
>>>>>
>>>>>> achieve
>>>>>>>>>
>>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>>
>>>>>>>>>>> Therefore
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>>>>
>>>>>>>>>>> practice
>>>>>>>
>>>>>>>> as
>>>>>>>>
>>>>>>>>> soon
>>>>>>>>>>>
>>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>
>>>>> assumption
>>>>>>>>>
>>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>>
>>>>>>>>>>> necessary
>>>>>>>>
>>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>>>
>>>>>>>>>>> point,
>>>>>>
>>>>>>> such an
>>>>>>>>>
>>>>>>>>>> assumption is never safe.
>>>>>>>>>>>>
>>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>>
>>>>>>>>>>> relying
>>>>>>>
>>>>>>>> on
>>>>>>>>
>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>>
>>>>>>>>>>> might
>>>>>
>>>>>> use
>>>>>>>
>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>>
>>>>>>>>>>> operation,
>>>>>
>>>>>> by
>>>>>>>
>>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>>
>>>>>>>>>>> could
>>>>
>>>>> be
>>>>>>
>>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>>>>
>>>
> 
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

Sometimes we don't arrive at a point where discussions happen and decisions
are made until code is about to ship. A general thing with open source, I
think. It is less than ideal but important to strike when that iron is
(eventually) hot.


On Tue, Oct 17, 2017 at 3:12 PM, Josh Elser <el...@apache.org> wrote:

> (catching up here)
>
> I'm glad to see you fine folks came to a conclusion around a reduced-scope
> solution (correct me if I'm wrong). "Some" bypass mechanism would stay for
> preXXX methods, and we'd remove it for the other methods? What exactly the
> "bypass API" would be is up in the air, correct?
>
> Duo -- maybe you could put the "current plan" on HBASE-18770 since
> discussion appears to have died down?
>
> I was originally lamenting yet another big, sweeping change to CPs when I
> had expected alpha-4 to have already landed. But, let me play devil's
> advocate: is this something we still think is critical to do in alpha-4? I
> can respect wanting to get address all of these smells, but I'd be worry it
> delays us further.
>
>
> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>
>> Creating an exception is expensive so if it is not suggested to do it in a
>> normal case. A common trick is to create a global exception instance, and
>> always throw it to avoid creating every time but I think it is more
>> friendly to just use a return value?
>>
>> And for me, the bypass after preXXX for normal region operations just
>> equals to a 'cancel', which is very clear and easy to understand, so I
>> think it is OK to add bypass support for them. And also for compaction and
>> flush, it is OK to give CP users the ability to cancel the operation as
>> the
>> semantic is clear, although I'm not sure how CP users would use this
>> feature.
>>
>> In general, I think we can provide bypass/cancel support in preXXX methods
>> where it is the very beginning of an operation.
>>
>> Thanks.
>>
>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
>>
>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>>>>
>>> its long encoding writing Increments. Not sure how we'd do that,
>>> selectively.
>>>
>>> If we can handle the rest of the trouble that you observed:
>>>
>>> 1) Lack of recognition and identification of when the key value to
>>> increment doesn't exist
>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>
>>> then they might be able to make it work. Perhaps a conversion from HBase
>>> native to Phoenix LONG encoding when processing results, in the wrapping
>>> scanner, informed by schema metadata.
>>>
>>> Or if we are keeping the bypass semantic in select places but
>>> implementing
>>> it with something other than today's bypass() API (please) this would be
>>> another candidate for where to keep it. Duo suggests keeping the semantic
>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>> redo
>>> those APIs to skip normal processing based on a return value or exception
>>> but otherwise drop bypass from all the others. It will clean up areas of
>>> confusion, e.g. can I bypass splits or flushes or not? Or what about this
>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>> would be: only RPC hooks will early out, and only if you return this
>>> value,
>>> or throw that exception.
>>>
>>>
>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>>
>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>
>>> result.
>>>
>>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>>> internally.
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>>
>>>> Rather than continue to support a weird bypass() which works in some
>>>>>
>>>> places
>>>>
>>>>> and not in others, perhaps we can substitute it with an exception? So
>>>>>
>>>> if
>>>
>>>> the coprocessor throws this exception in the pre hook then where it is
>>>>> allowed we catch it and do the right thing, and where it is not allowed
>>>>>
>>>> we
>>>>
>>>>> don't catch it and the server aborts. This will at least improve the
>>>>>
>>>> silent
>>>>
>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>
>>>> calling
>>>
>>>> this environment method has magic side effects. Everyone understands
>>>>>
>>>> how
>>>
>>>> exceptions work, so it will be clearer.
>>>>>
>>>>>
>>>>> We could do that though throw and catch of exceptions would be costly.
>>>>
>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>> preXXX
>>>> in a few select methods returning a boolean on whether bypass? Would
>>>> that
>>>> work? (Would have to figure metrics still).
>>>>
>>>>
>>>>
>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>
>>>> brought
>>>
>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>>> file JIRAs to follow up?
>>>>>
>>>>>
>>>>>
>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>>>> use
>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>> selectively.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> These examples are great.
>>>>>>
>>>>>> And I think for normal region operations such as get, put, delete,
>>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>>>>
>>>>> preXXX
>>>>>
>>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>>>>
>>>>> just
>>>>>
>>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>>> framework that we have already done the real operation so just give
>>>>>>
>>>>> up
>>>
>>>> and
>>>>>
>>>>>> return?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>>
>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>>>>>>
>>>>>> preDelete()
>>>>
>>>>> to
>>>>>>
>>>>>>> override handling of delete tombstones in a transactional way:
>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>
>>>>>> hbase/coprocessor/
>>>>>>
>>>>>>> TransactionProcessor.java#L244
>>>>>>>
>>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>>>>>>>
>>>>>> preGetOp()
>>>
>>>> and
>>>>>
>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>
>>>>>> of
>>>>
>>>>> readless increments:
>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>
>>>>>>> What would be the alternate approach for these applications?  In
>>>>>>>
>>>>>> both
>>>
>>>> cases
>>>>>>
>>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>
>>>>>>
>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>>>>>>>>
>>>>>>> abt
>>>
>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>> scanner).  The CP pre code itself doing the work of object
>>>>>>>>
>>>>>>> creation
>>>
>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>
>>>>>>> can
>>>>
>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>>>>>>>
>>>>>>> one
>>>>
>>>>> jira issue where the usage was this way..   The wrapping can be
>>>>>>>>
>>>>>>> done
>>>>
>>>>> in post also in such cases I believe.
>>>>>>>>
>>>>>>>> -Anoop-
>>>>>>>>
>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>>>>>>
>>>>>>> apurtell@apache.org>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think we should continue to support overriding function by
>>>>>>>>>
>>>>>>>> object
>>>>
>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>
>>>>>>>> than
>>>
>>>> removing
>>>>>>>>
>>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>>>>>>>>>
>>>>>>>> depends
>>>
>>>> on
>>>>>
>>>>>> being
>>>>>>>>
>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>>>>>>>
>>>>>>>> anoop.hbase@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When we say bypass the core code, it can be done today not
>>>>>>>>>>
>>>>>>>>> only
>>>
>>>> by
>>>>
>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>
>>>>>>>>> the
>>>>
>>>>> pre
>>>>>>
>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>>>>>>>>>>
>>>>>>>>> we
>>>
>>>> will
>>>>>
>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>>>>>>>>>
>>>>>>>>> remove
>>>>
>>>>> any
>>>>>>
>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>>>>>>>
>>>>>>>>> execution
>>>>>>
>>>>>>> ?   Am +1.
>>>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>>>>>>>
>>>>>>>>> apurtell@apache.org
>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The coprocessor API provides an environment method,
>>>>>>>>>>>
>>>>>>>>>> bypass(),
>>>
>>>> that
>>>>>
>>>>>> when
>>>>>>>>
>>>>>>>>> called from a preXXX hook will cause the core code to skip
>>>>>>>>>>>
>>>>>>>>>> all
>>>
>>>> remaining
>>>>>>>>
>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>>>>>>>>>>
>>>>>>>>>> Since
>>>>
>>>>> this
>>>>>>
>>>>>>> time I
>>>>>>>>>>
>>>>>>>>>>> think we are more enlightened about the complications of
>>>>>>>>>>>
>>>>>>>>>> this
>>>
>>>> feature.
>>>>>>>
>>>>>>>> (Or,
>>>>>>>>>>
>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>
>>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>>>>>>>>>
>>>>>>>>>> case
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>>>>>>>>>>
>>>>>>>>>> call
>>>>
>>>>> bypass()
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>>>>>>>>>>>
>>>>>>>>>> lead
>>>
>>>> to a
>>>>>
>>>>>> poor
>>>>>>>>
>>>>>>>>> developer experience.
>>>>>>>>>>>
>>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> core
>>>>>>
>>>>>>> code
>>>>>>>>
>>>>>>>>> implementing the remainder of the operation. In order to
>>>>>>>>>>>
>>>>>>>>>> understand
>>>>>>
>>>>>>> what
>>>>>>>>
>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>>>>>>>>>
>>>>>>>>>> read
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>
>>>>>>>>>> Although I
>>>>
>>>>> think
>>>>>>>
>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>>>>>>>>
>>>>>>>>>> demands a
>>>>>>
>>>>>>> lot. I
>>>>>>>>>>
>>>>>>>>>>> think it would provide a much better developer experience if
>>>>>>>>>>>
>>>>>>>>>> we
>>>>
>>>>> didn't
>>>>>>>
>>>>>>>> allow bypass, even though it means - in theory - a
>>>>>>>>>>>
>>>>>>>>>> coprocessor
>>>
>>>> would
>>>>>>
>>>>>>> be a
>>>>>>>>
>>>>>>>>> lot more limited in some ways than before. What is skipped
>>>>>>>>>>>
>>>>>>>>>> is
>>>
>>>> extremely
>>>>>>>>
>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>>>>>>>>
>>>>>>>>>> significantly,
>>>>>>
>>>>>>> even
>>>>>>>>
>>>>>>>>> between point releases. We do not provide the promise of
>>>>>>>>>>>
>>>>>>>>>> consistent
>>>>>>
>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>
>>>>>>>>>> semantic.
>>>
>>>> To
>>>>
>>>>> achieve
>>>>>>>>
>>>>>>>>> that we could not change any code between hook points.
>>>>>>>>>>>
>>>>>>>>>> Therefore
>>>>
>>>>> the
>>>>>>
>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>>>>>>>>
>>>>>>>>>> practice
>>>>>>
>>>>>>> as
>>>>>>>
>>>>>>>> soon
>>>>>>>>>>
>>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>>>>>>>>>>>
>>>>>>>>>> the
>>>
>>>> assumption
>>>>>>>>
>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>>>>>>>
>>>>>>>>>> necessary
>>>>>>>
>>>>>>>> skipped concerns. Because those concerns can change at any
>>>>>>>>>>>
>>>>>>>>>> point,
>>>>>
>>>>>> such an
>>>>>>>>
>>>>>>>>> assumption is never safe.
>>>>>>>>>>>
>>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>>>>>>>>
>>>>>>>>>> relying
>>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>>>>>>>>>>
>>>>>>>>>> might
>>>>
>>>>> use
>>>>>>
>>>>>>> it
>>>>>>>>
>>>>>>>>> in
>>>>>>>>>>
>>>>>>>>>>> one place to promote a normal mutation into an atomic
>>>>>>>>>>>
>>>>>>>>>> operation,
>>>>
>>>>> by
>>>>>>
>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>
>>>>>>>>>> could
>>>
>>>> be
>>>>>
>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>
>>>
>>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Josh Elser <el...@apache.org>.

(catching up here)

I'm glad to see you fine folks came to a conclusion around a 
reduced-scope solution (correct me if I'm wrong). "Some" bypass 
mechanism would stay for preXXX methods, and we'd remove it for the 
other methods? What exactly the "bypass API" would be is up in the air, 
correct?

Duo -- maybe you could put the "current plan" on HBASE-18770 since 
discussion appears to have died down?

I was originally lamenting yet another big, sweeping change to CPs when 
I had expected alpha-4 to have already landed. But, let me play devil's 
advocate: is this something we still think is critical to do in alpha-4? 
I can respect wanting to get address all of these smells, but I'd be 
worry it delays us further.

On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
> Creating an exception is expensive so if it is not suggested to do it in a
> normal case. A common trick is to create a global exception instance, and
> always throw it to avoid creating every time but I think it is more
> friendly to just use a return value?
> 
> And for me, the bypass after preXXX for normal region operations just
> equals to a 'cancel', which is very clear and easy to understand, so I
> think it is OK to add bypass support for them. And also for compaction and
> flush, it is OK to give CP users the ability to cancel the operation as the
> semantic is clear, although I'm not sure how CP users would use this
> feature.
> 
> In general, I think we can provide bypass/cancel support in preXXX methods
> where it is the very beginning of an operation.
> 
> Thanks.
> 
> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:
> 
>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>> its long encoding writing Increments. Not sure how we'd do that,
>> selectively.
>>
>> If we can handle the rest of the trouble that you observed:
>>
>> 1) Lack of recognition and identification of when the key value to
>> increment doesn't exist
>> 2) Lack of the ability to set the timestamp of the updated key value.
>>
>> then they might be able to make it work. Perhaps a conversion from HBase
>> native to Phoenix LONG encoding when processing results, in the wrapping
>> scanner, informed by schema metadata.
>>
>> Or if we are keeping the bypass semantic in select places but implementing
>> it with something other than today's bypass() API (please) this would be
>> another candidate for where to keep it. Duo suggests keeping the semantic
>> in all of the basic RPC preXXX hooks for query and mutation. We could redo
>> those APIs to skip normal processing based on a return value or exception
>> but otherwise drop bypass from all the others. It will clean up areas of
>> confusion, e.g. can I bypass splits or flushes or not? Or what about this
>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>> would be: only RPC hooks will early out, and only if you return this value,
>> or throw that exception.
>>
>>
>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>>
>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
>>> user does a Get returning instead the result of its own (Flow) Scan
>> result.
>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
>>> internally.
>>>
>>>
>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>>
>>>> Rather than continue to support a weird bypass() which works in some
>>> places
>>>> and not in others, perhaps we can substitute it with an exception? So
>> if
>>>> the coprocessor throws this exception in the pre hook then where it is
>>>> allowed we catch it and do the right thing, and where it is not allowed
>>> we
>>>> don't catch it and the server aborts. This will at least improve the
>>> silent
>>>> bypass() failure problem. I also don't like, in retrospect, that
>> calling
>>>> this environment method has magic side effects. Everyone understands
>> how
>>>> exceptions work, so it will be clearer.
>>>>
>>>>
>>> We could do that though throw and catch of exceptions would be costly.
>>>
>>> What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
>>> in a few select methods returning a boolean on whether bypass? Would that
>>> work? (Would have to figure metrics still).
>>>
>>>
>>>
>>>> In any case we should try to address the Tephra and Phoenix cases
>> brought
>>>> up in this discussion. They look like we can find alternatives. Shall I
>>>> file JIRAs to follow up?
>>>>
>>>>
>>>>
>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
>>> its long encoding writing Increments. Not sure how we'd do that,
>>> selectively.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> These examples are great.
>>>>>
>>>>> And I think for normal region operations such as get, put, delete,
>>>>> checkAndXXX, increment, it is OK to bypass the real operation after
>>>> preXXX
>>>>> as the semantic is clear enough. Instead of calling env.bypass, maybe
>>>> just
>>>>> let these preXXX methods return a boolean is enough to tell the HBase
>>>>> framework that we have already done the real operation so just give
>> up
>>>> and
>>>>> return?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>>>>>
>>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
>>> preDelete()
>>>>> to
>>>>>> override handling of delete tombstones in a transactional way:
>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>> hbase/coprocessor/
>>>>>> TransactionProcessor.java#L244
>>>>>>
>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in
>> preGetOp()
>>>> and
>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>> of
>>>>>> readless increments:
>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>
>>>>>> What would be the alternate approach for these applications?  In
>> both
>>>>> cases
>>>>>> they need to impose their own semantics on the underlying KeyValue
>>>>>> storage.  Is there a different way this can be done?
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>
>>>>> wrote:
>>>>>>
>>>>>>> Wrap core scanners is different right?  That can be done in post
>>>>>>> hooks.  I have seen many use cases for this..  Its the question
>> abt
>>>>>>> the pre hooks where we have not yet created the core object (like
>>>>>>> scanner).  The CP pre code itself doing the work of object
>> creation
>>>>>>> and so the core code is been bypassed.    Well the wrapping thing
>>> can
>>>>>>> be done in pre hook also. First create the core object by CP code
>>>>>>> itself and then do the wrapped object and return.. I have seen in
>>> one
>>>>>>> jira issue where the usage was this way..   The wrapping can be
>>> done
>>>>>>> in post also in such cases I believe.
>>>>>>>
>>>>>>> -Anoop-
>>>>>>>
>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
>>>> apurtell@apache.org>
>>>>>>> wrote:
>>>>>>>> I think we should continue to support overriding function by
>>> object
>>>>>>>> inheritance. I didn't mention this and am not proposing more
>> than
>>>>>>> removing
>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
>> depends
>>>> on
>>>>>>> being
>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
>>>> anoop.hbase@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> When we say bypass the core code, it can be done today not
>> only
>>> by
>>>>>>>>> calling bypass but by returning a not null object for some of
>>> the
>>>>> pre
>>>>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
>> we
>>>> will
>>>>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>> scanner(s).  So this proposal include this aspect also and
>>> remove
>>>>> any
>>>>>>>>> possible way of bypassing the core code by the CP hook code
>>>>> execution
>>>>>>>>> ?   Am +1.
>>>>>>>>>
>>>>>>>>> -Anoop-
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
>>>>> apurtell@apache.org
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>> The coprocessor API provides an environment method,
>> bypass(),
>>>> that
>>>>>>> when
>>>>>>>>>> called from a preXXX hook will cause the core code to skip
>> all
>>>>>>> remaining
>>>>>>>>>> processing. This capability was introduced on HBASE-3348.
>>> Since
>>>>> this
>>>>>>>>> time I
>>>>>>>>>> think we are more enlightened about the complications of
>> this
>>>>>> feature.
>>>>>>>>> (Or,
>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>
>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the
>>>> case
>>>>>> the
>>>>>>>>>> javadoc for the hook says so, but it can be missed. If you
>>> call
>>>>>>> bypass()
>>>>>>>>> in
>>>>>>>>>> a hook where it is not supported it is a no-op. This can
>> lead
>>>> to a
>>>>>>> poor
>>>>>>>>>> developer experience.
>>>>>>>>>>
>>>>>>>>>> Where bypass is supported what is being bypassed is all of
>> the
>>>>> core
>>>>>>> code
>>>>>>>>>> implementing the remainder of the operation. In order to
>>>>> understand
>>>>>>> what
>>>>>>>>>> calling bypass() will skip, a coprocessor implementer should
>>>> read
>>>>>> and
>>>>>>>>>> understand all of the remaining code and its nuances.
>>> Although I
>>>>>> think
>>>>>>>>> this
>>>>>>>>>> is good practice for coprocessor developers in general, it
>>>>> demands a
>>>>>>>>> lot. I
>>>>>>>>>> think it would provide a much better developer experience if
>>> we
>>>>>> didn't
>>>>>>>>>> allow bypass, even though it means - in theory - a
>> coprocessor
>>>>> would
>>>>>>> be a
>>>>>>>>>> lot more limited in some ways than before. What is skipped
>> is
>>>>>>> extremely
>>>>>>>>>> version dependent. That core code will vary, perhaps
>>>>> significantly,
>>>>>>> even
>>>>>>>>>> between point releases. We do not provide the promise of
>>>>> consistent
>>>>>>>>>> behavior even between point releases for the bypass
>> semantic.
>>> To
>>>>>>> achieve
>>>>>>>>>> that we could not change any code between hook points.
>>> Therefore
>>>>> the
>>>>>>>>>> coprocessor implementer becomes an HBase core developer in
>>>>> practice
>>>>>> as
>>>>>>>>> soon
>>>>>>>>>> as they rely on bypass(). Every release of HBase may break
>> the
>>>>>>> assumption
>>>>>>>>>> that the replacement for the bypassed code takes care of all
>>>>>> necessary
>>>>>>>>>> skipped concerns. Because those concerns can change at any
>>>> point,
>>>>>>> such an
>>>>>>>>>> assumption is never safe.
>>>>>>>>>>
>>>>>>>>>> I say "in theory" because I would be surprised if anyone is
>>>>> relying
>>>>>> on
>>>>>>>>> the
>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix
>>> might
>>>>> use
>>>>>>> it
>>>>>>>>> in
>>>>>>>>>> one place to promote a normal mutation into an atomic
>>> operation,
>>>>> by
>>>>>>>>>> substituting one for the other, but if so that objective
>> could
>>>> be
>>>>>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>
>>>>>>>>>>
>>>
>>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

Creating an exception is expensive so if it is not suggested to do it in a
normal case. A common trick is to create a global exception instance, and
always throw it to avoid creating every time but I think it is more
friendly to just use a return value?

And for me, the bypass after preXXX for normal region operations just
equals to a 'cancel', which is very clear and easy to understand, so I
think it is OK to add bypass support for them. And also for compaction and
flush, it is OK to give CP users the ability to cancel the operation as the
semantic is clear, although I'm not sure how CP users would use this
feature.

In general, I think we can provide bypass/cancel support in preXXX methods
where it is the very beginning of an operation.

Thanks.

2017-10-12 3:10 GMT+08:00 Andrew Purtell <ap...@apache.org>:

> > On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
> its long encoding writing Increments. Not sure how we'd do that,
> selectively.
>
> If we can handle the rest of the trouble that you observed:
>
> 1) Lack of recognition and identification of when the key value to
> increment doesn't exist
> 2) Lack of the ability to set the timestamp of the updated key value.
>
> then they might be able to make it work. Perhaps a conversion from HBase
> native to Phoenix LONG encoding when processing results, in the wrapping
> scanner, informed by schema metadata.
>
> Or if we are keeping the bypass semantic in select places but implementing
> it with something other than today's bypass() API (please) this would be
> another candidate for where to keep it. Duo suggests keeping the semantic
> in all of the basic RPC preXXX hooks for query and mutation. We could redo
> those APIs to skip normal processing based on a return value or exception
> but otherwise drop bypass from all the others. It will clean up areas of
> confusion, e.g. can I bypass splits or flushes or not? Or what about this
> arcane hook in compaction? Or [insert some deep hook here]? The answer
> would be: only RPC hooks will early out, and only if you return this value,
> or throw that exception.
>
>
> On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:
>
> > The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
> > user does a Get returning instead the result of its own (Flow) Scan
> result.
> > Not sure how we'd do alternative here; Timeline Server is keeping Tags
> > internally.
> >
> >
> > On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > Rather than continue to support a weird bypass() which works in some
> > places
> > > and not in others, perhaps we can substitute it with an exception? So
> if
> > > the coprocessor throws this exception in the pre hook then where it is
> > > allowed we catch it and do the right thing, and where it is not allowed
> > we
> > > don't catch it and the server aborts. This will at least improve the
> > silent
> > > bypass() failure problem. I also don't like, in retrospect, that
> calling
> > > this environment method has magic side effects. Everyone understands
> how
> > > exceptions work, so it will be clearer.
> > >
> > >
> > We could do that though throw and catch of exceptions would be costly.
> >
> > What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
> > in a few select methods returning a boolean on whether bypass? Would that
> > work? (Would have to figure metrics still).
> >
> >
> >
> > > In any case we should try to address the Tephra and Phoenix cases
> brought
> > > up in this discussion. They look like we can find alternatives. Shall I
> > > file JIRAs to follow up?
> > >
> > >
> > >
> > On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
> > its long encoding writing Increments. Not sure how we'd do that,
> > selectively.
> >
> > St.Ack
> >
> >
> >
> > > On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
> > > wrote:
> > >
> > > > These examples are great.
> > > >
> > > > And I think for normal region operations such as get, put, delete,
> > > > checkAndXXX, increment, it is OK to bypass the real operation after
> > > preXXX
> > > > as the semantic is clear enough. Instead of calling env.bypass, maybe
> > > just
> > > > let these preXXX methods return a boolean is enough to tell the HBase
> > > > framework that we have already done the real operation so just give
> up
> > > and
> > > > return?
> > > >
> > > > Thanks.
> > > >
> > > > 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> > > >
> > > > > The Tephra TransactionProcessor CP makes use of bypass() in
> > preDelete()
> > > > to
> > > > > override handling of delete tombstones in a transactional way:
> > > > > https://github.com/apache/incubator-tephra/blob/master/
> > > > > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > > > hbase/coprocessor/
> > > > > TransactionProcessor.java#L244
> > > > >
> > > > > The CDAP IncrementHandler CP also makes use of bypass() in
> preGetOp()
> > > and
> > > > > preIncrementAfterRRowLock() to provide a transaction implementation
> > of
> > > > > readless increments:
> > > > > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > > > > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > > > > hbase11/IncrementHandler.java#L121
> > > > >
> > > > > What would be the alternate approach for these applications?  In
> both
> > > > cases
> > > > > they need to impose their own semantics on the underlying KeyValue
> > > > > storage.  Is there a different way this can be done?
> > > > >
> > > > >
> > > > > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Wrap core scanners is different right?  That can be done in post
> > > > > > hooks.  I have seen many use cases for this..  Its the question
> abt
> > > > > > the pre hooks where we have not yet created the core object (like
> > > > > > scanner).  The CP pre code itself doing the work of object
> creation
> > > > > > and so the core code is been bypassed.    Well the wrapping thing
> > can
> > > > > > be done in pre hook also. First create the core object by CP code
> > > > > > itself and then do the wrapped object and return.. I have seen in
> > one
> > > > > > jira issue where the usage was this way..   The wrapping can be
> > done
> > > > > > in post also in such cases I believe.
> > > > > >
> > > > > > -Anoop-
> > > > > >
> > > > > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> > > apurtell@apache.org>
> > > > > > wrote:
> > > > > > > I think we should continue to support overriding function by
> > object
> > > > > > > inheritance. I didn't mention this and am not proposing more
> than
> > > > > > removing
> > > > > > > the bypass() sematic. No more no less. Phoenix absolutely
> depends
> > > on
> > > > > > being
> > > > > > > able to wrap core scanners and return the wrappers.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> > > anoop.hbase@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> When we say bypass the core code, it can be done today not
> only
> > by
> > > > > > >> calling bypass but by returning a not null object for some of
> > the
> > > > pre
> > > > > > >> hooks.  Like preScannerOpen() if it return a scanner object,
> we
> > > will
> > > > > > >> avoid the remaining core code execution for creation of the
> > > > > > >> scanner(s).  So this proposal include this aspect also and
> > remove
> > > > any
> > > > > > >> possible way of bypassing the core code by the CP hook code
> > > > execution
> > > > > > >> ?   Am +1.
> > > > > > >>
> > > > > > >> -Anoop-
> > > > > > >>
> > > > > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > > > apurtell@apache.org
> > > > > >
> > > > > > >> wrote:
> > > > > > >> > The coprocessor API provides an environment method,
> bypass(),
> > > that
> > > > > > when
> > > > > > >> > called from a preXXX hook will cause the core code to skip
> all
> > > > > > remaining
> > > > > > >> > processing. This capability was introduced on HBASE-3348.
> > Since
> > > > this
> > > > > > >> time I
> > > > > > >> > think we are more enlightened about the complications of
> this
> > > > > feature.
> > > > > > >> (Or,
> > > > > > >> > anyway, speaking for myself:)
> > > > > > >> >
> > > > > > >> > Not all hooks provide the bypass semantic. Where this is the
> > > case
> > > > > the
> > > > > > >> > javadoc for the hook says so, but it can be missed. If you
> > call
> > > > > > bypass()
> > > > > > >> in
> > > > > > >> > a hook where it is not supported it is a no-op. This can
> lead
> > > to a
> > > > > > poor
> > > > > > >> > developer experience.
> > > > > > >> >
> > > > > > >> > Where bypass is supported what is being bypassed is all of
> the
> > > > core
> > > > > > code
> > > > > > >> > implementing the remainder of the operation. In order to
> > > > understand
> > > > > > what
> > > > > > >> > calling bypass() will skip, a coprocessor implementer should
> > > read
> > > > > and
> > > > > > >> > understand all of the remaining code and its nuances.
> > Although I
> > > > > think
> > > > > > >> this
> > > > > > >> > is good practice for coprocessor developers in general, it
> > > > demands a
> > > > > > >> lot. I
> > > > > > >> > think it would provide a much better developer experience if
> > we
> > > > > didn't
> > > > > > >> > allow bypass, even though it means - in theory - a
> coprocessor
> > > > would
> > > > > > be a
> > > > > > >> > lot more limited in some ways than before. What is skipped
> is
> > > > > > extremely
> > > > > > >> > version dependent. That core code will vary, perhaps
> > > > significantly,
> > > > > > even
> > > > > > >> > between point releases. We do not provide the promise of
> > > > consistent
> > > > > > >> > behavior even between point releases for the bypass
> semantic.
> > To
> > > > > > achieve
> > > > > > >> > that we could not change any code between hook points.
> > Therefore
> > > > the
> > > > > > >> > coprocessor implementer becomes an HBase core developer in
> > > > practice
> > > > > as
> > > > > > >> soon
> > > > > > >> > as they rely on bypass(). Every release of HBase may break
> the
> > > > > > assumption
> > > > > > >> > that the replacement for the bypassed code takes care of all
> > > > > necessary
> > > > > > >> > skipped concerns. Because those concerns can change at any
> > > point,
> > > > > > such an
> > > > > > >> > assumption is never safe.
> > > > > > >> >
> > > > > > >> > I say "in theory" because I would be surprised if anyone is
> > > > relying
> > > > > on
> > > > > > >> the
> > > > > > >> > bypass for the above reason. I seem to recall that Phoenix
> > might
> > > > use
> > > > > > it
> > > > > > >> in
> > > > > > >> > one place to promote a normal mutation into an atomic
> > operation,
> > > > by
> > > > > > >> > substituting one for the other, but if so that objective
> could
> > > be
> > > > > > >> > reimplemented using their new locking manager.
> > > > > > >> >
> > > > > > >> >
> >
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
its long encoding writing Increments. Not sure how we'd do that,
selectively.

If we can handle the rest of the trouble that you observed:

1) Lack of recognition and identification of when the key value to
increment doesn't exist
2) Lack of the ability to set the timestamp of the updated key value.

then they might be able to make it work. Perhaps a conversion from HBase
native to Phoenix LONG encoding when processing results, in the wrapping
scanner, informed by schema metadata.

Or if we are keeping the bypass semantic in select places but implementing
it with something other than today's bypass() API (please) this would be
another candidate for where to keep it. Duo suggests keeping the semantic
in all of the basic RPC preXXX hooks for query and mutation. We could redo
those APIs to skip normal processing based on a return value or exception
but otherwise drop bypass from all the others. It will clean up areas of
confusion, e.g. can I bypass splits or flushes or not? Or what about this
arcane hook in compaction? Or [insert some deep hook here]? The answer
would be: only RPC hooks will early out, and only if you return this value,
or throw that exception.


On Wed, Oct 11, 2017 at 11:56 AM, Stack <st...@duboce.net> wrote:

> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
> user does a Get returning instead the result of its own (Flow) Scan result.
> Not sure how we'd do alternative here; Timeline Server is keeping Tags
> internally.
>
>
> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Rather than continue to support a weird bypass() which works in some
> places
> > and not in others, perhaps we can substitute it with an exception? So if
> > the coprocessor throws this exception in the pre hook then where it is
> > allowed we catch it and do the right thing, and where it is not allowed
> we
> > don't catch it and the server aborts. This will at least improve the
> silent
> > bypass() failure problem. I also don't like, in retrospect, that calling
> > this environment method has magic side effects. Everyone understands how
> > exceptions work, so it will be clearer.
> >
> >
> We could do that though throw and catch of exceptions would be costly.
>
> What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
> in a few select methods returning a boolean on whether bypass? Would that
> work? (Would have to figure metrics still).
>
>
>
> > In any case we should try to address the Tephra and Phoenix cases brought
> > up in this discussion. They look like we can find alternatives. Shall I
> > file JIRAs to follow up?
> >
> >
> >
> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
> its long encoding writing Increments. Not sure how we'd do that,
> selectively.
>
> St.Ack
>
>
>
> > On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> > > These examples are great.
> > >
> > > And I think for normal region operations such as get, put, delete,
> > > checkAndXXX, increment, it is OK to bypass the real operation after
> > preXXX
> > > as the semantic is clear enough. Instead of calling env.bypass, maybe
> > just
> > > let these preXXX methods return a boolean is enough to tell the HBase
> > > framework that we have already done the real operation so just give up
> > and
> > > return?
> > >
> > > Thanks.
> > >
> > > 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> > >
> > > > The Tephra TransactionProcessor CP makes use of bypass() in
> preDelete()
> > > to
> > > > override handling of delete tombstones in a transactional way:
> > > > https://github.com/apache/incubator-tephra/blob/master/
> > > > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > > hbase/coprocessor/
> > > > TransactionProcessor.java#L244
> > > >
> > > > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp()
> > and
> > > > preIncrementAfterRRowLock() to provide a transaction implementation
> of
> > > > readless increments:
> > > > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > > > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > > > hbase11/IncrementHandler.java#L121
> > > >
> > > > What would be the alternate approach for these applications?  In both
> > > cases
> > > > they need to impose their own semantics on the underlying KeyValue
> > > > storage.  Is there a different way this can be done?
> > > >
> > > >
> > > > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> > > wrote:
> > > >
> > > > > Wrap core scanners is different right?  That can be done in post
> > > > > hooks.  I have seen many use cases for this..  Its the question abt
> > > > > the pre hooks where we have not yet created the core object (like
> > > > > scanner).  The CP pre code itself doing the work of object creation
> > > > > and so the core code is been bypassed.    Well the wrapping thing
> can
> > > > > be done in pre hook also. First create the core object by CP code
> > > > > itself and then do the wrapped object and return.. I have seen in
> one
> > > > > jira issue where the usage was this way..   The wrapping can be
> done
> > > > > in post also in such cases I believe.
> > > > >
> > > > > -Anoop-
> > > > >
> > > > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> > apurtell@apache.org>
> > > > > wrote:
> > > > > > I think we should continue to support overriding function by
> object
> > > > > > inheritance. I didn't mention this and am not proposing more than
> > > > > removing
> > > > > > the bypass() sematic. No more no less. Phoenix absolutely depends
> > on
> > > > > being
> > > > > > able to wrap core scanners and return the wrappers.
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> > anoop.hbase@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> When we say bypass the core code, it can be done today not only
> by
> > > > > >> calling bypass but by returning a not null object for some of
> the
> > > pre
> > > > > >> hooks.  Like preScannerOpen() if it return a scanner object, we
> > will
> > > > > >> avoid the remaining core code execution for creation of the
> > > > > >> scanner(s).  So this proposal include this aspect also and
> remove
> > > any
> > > > > >> possible way of bypassing the core code by the CP hook code
> > > execution
> > > > > >> ?   Am +1.
> > > > > >>
> > > > > >> -Anoop-
> > > > > >>
> > > > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > > apurtell@apache.org
> > > > >
> > > > > >> wrote:
> > > > > >> > The coprocessor API provides an environment method, bypass(),
> > that
> > > > > when
> > > > > >> > called from a preXXX hook will cause the core code to skip all
> > > > > remaining
> > > > > >> > processing. This capability was introduced on HBASE-3348.
> Since
> > > this
> > > > > >> time I
> > > > > >> > think we are more enlightened about the complications of this
> > > > feature.
> > > > > >> (Or,
> > > > > >> > anyway, speaking for myself:)
> > > > > >> >
> > > > > >> > Not all hooks provide the bypass semantic. Where this is the
> > case
> > > > the
> > > > > >> > javadoc for the hook says so, but it can be missed. If you
> call
> > > > > bypass()
> > > > > >> in
> > > > > >> > a hook where it is not supported it is a no-op. This can lead
> > to a
> > > > > poor
> > > > > >> > developer experience.
> > > > > >> >
> > > > > >> > Where bypass is supported what is being bypassed is all of the
> > > core
> > > > > code
> > > > > >> > implementing the remainder of the operation. In order to
> > > understand
> > > > > what
> > > > > >> > calling bypass() will skip, a coprocessor implementer should
> > read
> > > > and
> > > > > >> > understand all of the remaining code and its nuances.
> Although I
> > > > think
> > > > > >> this
> > > > > >> > is good practice for coprocessor developers in general, it
> > > demands a
> > > > > >> lot. I
> > > > > >> > think it would provide a much better developer experience if
> we
> > > > didn't
> > > > > >> > allow bypass, even though it means - in theory - a coprocessor
> > > would
> > > > > be a
> > > > > >> > lot more limited in some ways than before. What is skipped is
> > > > > extremely
> > > > > >> > version dependent. That core code will vary, perhaps
> > > significantly,
> > > > > even
> > > > > >> > between point releases. We do not provide the promise of
> > > consistent
> > > > > >> > behavior even between point releases for the bypass semantic.
> To
> > > > > achieve
> > > > > >> > that we could not change any code between hook points.
> Therefore
> > > the
> > > > > >> > coprocessor implementer becomes an HBase core developer in
> > > practice
> > > > as
> > > > > >> soon
> > > > > >> > as they rely on bypass(). Every release of HBase may break the
> > > > > assumption
> > > > > >> > that the replacement for the bypassed code takes care of all
> > > > necessary
> > > > > >> > skipped concerns. Because those concerns can change at any
> > point,
> > > > > such an
> > > > > >> > assumption is never safe.
> > > > > >> >
> > > > > >> > I say "in theory" because I would be surprised if anyone is
> > > relying
> > > > on
> > > > > >> the
> > > > > >> > bypass for the above reason. I seem to recall that Phoenix
> might
> > > use
> > > > > it
> > > > > >> in
> > > > > >> > one place to promote a normal mutation into an atomic
> operation,
> > > by
> > > > > >> > substituting one for the other, but if so that objective could
> > be
> > > > > >> > reimplemented using their new locking manager.
> > > > > >> >
> > > > > >> >
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

The YARN Timeline Server has the FlowRunCoprocessor. It does bypass when
user does a Get returning instead the result of its own (Flow) Scan result.
Not sure how we'd do alternative here; Timeline Server is keeping Tags
internally.


On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <ap...@apache.org>
wrote:

> Rather than continue to support a weird bypass() which works in some places
> and not in others, perhaps we can substitute it with an exception? So if
> the coprocessor throws this exception in the pre hook then where it is
> allowed we catch it and do the right thing, and where it is not allowed we
> don't catch it and the server aborts. This will at least improve the silent
> bypass() failure problem. I also don't like, in retrospect, that calling
> this environment method has magic side effects. Everyone understands how
> exceptions work, so it will be clearer.
>
>
We could do that though throw and catch of exceptions would be costly.

What about the Duo suggestion? Purge bypass flag and replace it w/ preXXX
in a few select methods returning a boolean on whether bypass? Would that
work? (Would have to figure metrics still).



> In any case we should try to address the Tephra and Phoenix cases brought
> up in this discussion. They look like we can find alternatives. Shall I
> file JIRAs to follow up?
>
>
>
On Phoenix Increment by-pass, an ornery item is that Phoenix wants to use
its long encoding writing Increments. Not sure how we'd do that,
selectively.

St.Ack



> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
> > These examples are great.
> >
> > And I think for normal region operations such as get, put, delete,
> > checkAndXXX, increment, it is OK to bypass the real operation after
> preXXX
> > as the semantic is clear enough. Instead of calling env.bypass, maybe
> just
> > let these preXXX methods return a boolean is enough to tell the HBase
> > framework that we have already done the real operation so just give up
> and
> > return?
> >
> > Thanks.
> >
> > 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
> >
> > > The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> > to
> > > override handling of delete tombstones in a transactional way:
> > > https://github.com/apache/incubator-tephra/blob/master/
> > > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> > hbase/coprocessor/
> > > TransactionProcessor.java#L244
> > >
> > > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp()
> and
> > > preIncrementAfterRRowLock() to provide a transaction implementation of
> > > readless increments:
> > > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > > hbase11/IncrementHandler.java#L121
> > >
> > > What would be the alternate approach for these applications?  In both
> > cases
> > > they need to impose their own semantics on the underlying KeyValue
> > > storage.  Is there a different way this can be done?
> > >
> > >
> > > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> > wrote:
> > >
> > > > Wrap core scanners is different right?  That can be done in post
> > > > hooks.  I have seen many use cases for this..  Its the question abt
> > > > the pre hooks where we have not yet created the core object (like
> > > > scanner).  The CP pre code itself doing the work of object creation
> > > > and so the core code is been bypassed.    Well the wrapping thing can
> > > > be done in pre hook also. First create the core object by CP code
> > > > itself and then do the wrapped object and return.. I have seen in one
> > > > jira issue where the usage was this way..   The wrapping can be done
> > > > in post also in such cases I believe.
> > > >
> > > > -Anoop-
> > > >
> > > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> apurtell@apache.org>
> > > > wrote:
> > > > > I think we should continue to support overriding function by object
> > > > > inheritance. I didn't mention this and am not proposing more than
> > > > removing
> > > > > the bypass() sematic. No more no less. Phoenix absolutely depends
> on
> > > > being
> > > > > able to wrap core scanners and return the wrappers.
> > > > >
> > > > >
> > > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> anoop.hbase@gmail.com>
> > > > wrote:
> > > > >
> > > > >> When we say bypass the core code, it can be done today not only by
> > > > >> calling bypass but by returning a not null object for some of the
> > pre
> > > > >> hooks.  Like preScannerOpen() if it return a scanner object, we
> will
> > > > >> avoid the remaining core code execution for creation of the
> > > > >> scanner(s).  So this proposal include this aspect also and remove
> > any
> > > > >> possible way of bypassing the core code by the CP hook code
> > execution
> > > > >> ?   Am +1.
> > > > >>
> > > > >> -Anoop-
> > > > >>
> > > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> > apurtell@apache.org
> > > >
> > > > >> wrote:
> > > > >> > The coprocessor API provides an environment method, bypass(),
> that
> > > > when
> > > > >> > called from a preXXX hook will cause the core code to skip all
> > > > remaining
> > > > >> > processing. This capability was introduced on HBASE-3348. Since
> > this
> > > > >> time I
> > > > >> > think we are more enlightened about the complications of this
> > > feature.
> > > > >> (Or,
> > > > >> > anyway, speaking for myself:)
> > > > >> >
> > > > >> > Not all hooks provide the bypass semantic. Where this is the
> case
> > > the
> > > > >> > javadoc for the hook says so, but it can be missed. If you call
> > > > bypass()
> > > > >> in
> > > > >> > a hook where it is not supported it is a no-op. This can lead
> to a
> > > > poor
> > > > >> > developer experience.
> > > > >> >
> > > > >> > Where bypass is supported what is being bypassed is all of the
> > core
> > > > code
> > > > >> > implementing the remainder of the operation. In order to
> > understand
> > > > what
> > > > >> > calling bypass() will skip, a coprocessor implementer should
> read
> > > and
> > > > >> > understand all of the remaining code and its nuances. Although I
> > > think
> > > > >> this
> > > > >> > is good practice for coprocessor developers in general, it
> > demands a
> > > > >> lot. I
> > > > >> > think it would provide a much better developer experience if we
> > > didn't
> > > > >> > allow bypass, even though it means - in theory - a coprocessor
> > would
> > > > be a
> > > > >> > lot more limited in some ways than before. What is skipped is
> > > > extremely
> > > > >> > version dependent. That core code will vary, perhaps
> > significantly,
> > > > even
> > > > >> > between point releases. We do not provide the promise of
> > consistent
> > > > >> > behavior even between point releases for the bypass semantic. To
> > > > achieve
> > > > >> > that we could not change any code between hook points. Therefore
> > the
> > > > >> > coprocessor implementer becomes an HBase core developer in
> > practice
> > > as
> > > > >> soon
> > > > >> > as they rely on bypass(). Every release of HBase may break the
> > > > assumption
> > > > >> > that the replacement for the bypassed code takes care of all
> > > necessary
> > > > >> > skipped concerns. Because those concerns can change at any
> point,
> > > > such an
> > > > >> > assumption is never safe.
> > > > >> >
> > > > >> > I say "in theory" because I would be surprised if anyone is
> > relying
> > > on
> > > > >> the
> > > > >> > bypass for the above reason. I seem to recall that Phoenix might
> > use
> > > > it
> > > > >> in
> > > > >> > one place to promote a normal mutation into an atomic operation,
> > by
> > > > >> > substituting one for the other, but if so that objective could
> be
> > > > >> > reimplemented using their new locking manager.
> > > > >> >
> > > > >> > --
> > > > >> > Best regards,
> > > > >> > Andrew
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew
> > > > >
> > > > > Words like orphans lost among the crosstalk, meaning torn from
> > truth's
> > > > > decrepit hands
> > > > >    - A23, Crosstalk
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

Rather than continue to support a weird bypass() which works in some places
and not in others, perhaps we can substitute it with an exception? So if
the coprocessor throws this exception in the pre hook then where it is
allowed we catch it and do the right thing, and where it is not allowed we
don't catch it and the server aborts. This will at least improve the silent
bypass() failure problem. I also don't like, in retrospect, that calling
this environment method has magic side effects. Everyone understands how
exceptions work, so it will be clearer.

In any case we should try to address the Tephra and Phoenix cases brought
up in this discussion. They look like we can find alternatives. Shall I
file JIRAs to follow up?


On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <pa...@gmail.com>
wrote:

> These examples are great.
>
> And I think for normal region operations such as get, put, delete,
> checkAndXXX, increment, it is OK to bypass the real operation after preXXX
> as the semantic is clear enough. Instead of calling env.bypass, maybe just
> let these preXXX methods return a boolean is enough to tell the HBase
> framework that we have already done the real operation so just give up and
> return?
>
> Thanks.
>
> 2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:
>
> > The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> to
> > override handling of delete tombstones in a transactional way:
> > https://github.com/apache/incubator-tephra/blob/master/
> > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> hbase/coprocessor/
> > TransactionProcessor.java#L244
> >
> > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> > preIncrementAfterRRowLock() to provide a transaction implementation of
> > readless increments:
> > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > hbase11/IncrementHandler.java#L121
> >
> > What would be the alternate approach for these applications?  In both
> cases
> > they need to impose their own semantics on the underlying KeyValue
> > storage.  Is there a different way this can be done?
> >
> >
> > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> wrote:
> >
> > > Wrap core scanners is different right?  That can be done in post
> > > hooks.  I have seen many use cases for this..  Its the question abt
> > > the pre hooks where we have not yet created the core object (like
> > > scanner).  The CP pre code itself doing the work of object creation
> > > and so the core code is been bypassed.    Well the wrapping thing can
> > > be done in pre hook also. First create the core object by CP code
> > > itself and then do the wrapped object and return.. I have seen in one
> > > jira issue where the usage was this way..   The wrapping can be done
> > > in post also in such cases I believe.
> > >
> > > -Anoop-
> > >
> > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > > > I think we should continue to support overriding function by object
> > > > inheritance. I didn't mention this and am not proposing more than
> > > removing
> > > > the bypass() sematic. No more no less. Phoenix absolutely depends on
> > > being
> > > > able to wrap core scanners and return the wrappers.
> > > >
> > > >
> > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
> > > wrote:
> > > >
> > > >> When we say bypass the core code, it can be done today not only by
> > > >> calling bypass but by returning a not null object for some of the
> pre
> > > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
> > > >> avoid the remaining core code execution for creation of the
> > > >> scanner(s).  So this proposal include this aspect also and remove
> any
> > > >> possible way of bypassing the core code by the CP hook code
> execution
> > > >> ?   Am +1.
> > > >>
> > > >> -Anoop-
> > > >>
> > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> apurtell@apache.org
> > >
> > > >> wrote:
> > > >> > The coprocessor API provides an environment method, bypass(), that
> > > when
> > > >> > called from a preXXX hook will cause the core code to skip all
> > > remaining
> > > >> > processing. This capability was introduced on HBASE-3348. Since
> this
> > > >> time I
> > > >> > think we are more enlightened about the complications of this
> > feature.
> > > >> (Or,
> > > >> > anyway, speaking for myself:)
> > > >> >
> > > >> > Not all hooks provide the bypass semantic. Where this is the case
> > the
> > > >> > javadoc for the hook says so, but it can be missed. If you call
> > > bypass()
> > > >> in
> > > >> > a hook where it is not supported it is a no-op. This can lead to a
> > > poor
> > > >> > developer experience.
> > > >> >
> > > >> > Where bypass is supported what is being bypassed is all of the
> core
> > > code
> > > >> > implementing the remainder of the operation. In order to
> understand
> > > what
> > > >> > calling bypass() will skip, a coprocessor implementer should read
> > and
> > > >> > understand all of the remaining code and its nuances. Although I
> > think
> > > >> this
> > > >> > is good practice for coprocessor developers in general, it
> demands a
> > > >> lot. I
> > > >> > think it would provide a much better developer experience if we
> > didn't
> > > >> > allow bypass, even though it means - in theory - a coprocessor
> would
> > > be a
> > > >> > lot more limited in some ways than before. What is skipped is
> > > extremely
> > > >> > version dependent. That core code will vary, perhaps
> significantly,
> > > even
> > > >> > between point releases. We do not provide the promise of
> consistent
> > > >> > behavior even between point releases for the bypass semantic. To
> > > achieve
> > > >> > that we could not change any code between hook points. Therefore
> the
> > > >> > coprocessor implementer becomes an HBase core developer in
> practice
> > as
> > > >> soon
> > > >> > as they rely on bypass(). Every release of HBase may break the
> > > assumption
> > > >> > that the replacement for the bypassed code takes care of all
> > necessary
> > > >> > skipped concerns. Because those concerns can change at any point,
> > > such an
> > > >> > assumption is never safe.
> > > >> >
> > > >> > I say "in theory" because I would be surprised if anyone is
> relying
> > on
> > > >> the
> > > >> > bypass for the above reason. I seem to recall that Phoenix might
> use
> > > it
> > > >> in
> > > >> > one place to promote a normal mutation into an atomic operation,
> by
> > > >> > substituting one for the other, but if so that objective could be
> > > >> > reimplemented using their new locking manager.
> > > >> >
> > > >> > --
> > > >> > Best regards,
> > > >> > Andrew
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >    - A23, Crosstalk
> > >
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

These examples are great.

And I think for normal region operations such as get, put, delete,
checkAndXXX, increment, it is OK to bypass the real operation after preXXX
as the semantic is clear enough. Instead of calling env.bypass, maybe just
let these preXXX methods return a boolean is enough to tell the HBase
framework that we have already done the real operation so just give up and
return?

Thanks.

2017-10-11 3:19 GMT+08:00 Gary Helmling <gh...@gmail.com>:

> The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
> override handling of delete tombstones in a transactional way:
> https://github.com/apache/incubator-tephra/blob/master/
> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/hbase/coprocessor/
> TransactionProcessor.java#L244
>
> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> preIncrementAfterRRowLock() to provide a transaction implementation of
> readless increments:
> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> hbase11/IncrementHandler.java#L121
>
> What would be the alternate approach for these applications?  In both cases
> they need to impose their own semantics on the underlying KeyValue
> storage.  Is there a different way this can be done?
>
>
> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com> wrote:
>
> > Wrap core scanners is different right?  That can be done in post
> > hooks.  I have seen many use cases for this..  Its the question abt
> > the pre hooks where we have not yet created the core object (like
> > scanner).  The CP pre code itself doing the work of object creation
> > and so the core code is been bypassed.    Well the wrapping thing can
> > be done in pre hook also. First create the core object by CP code
> > itself and then do the wrapped object and return.. I have seen in one
> > jira issue where the usage was this way..   The wrapping can be done
> > in post also in such cases I believe.
> >
> > -Anoop-
> >
> > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > > I think we should continue to support overriding function by object
> > > inheritance. I didn't mention this and am not proposing more than
> > removing
> > > the bypass() sematic. No more no less. Phoenix absolutely depends on
> > being
> > > able to wrap core scanners and return the wrappers.
> > >
> > >
> > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
> > wrote:
> > >
> > >> When we say bypass the core code, it can be done today not only by
> > >> calling bypass but by returning a not null object for some of the pre
> > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
> > >> avoid the remaining core code execution for creation of the
> > >> scanner(s).  So this proposal include this aspect also and remove any
> > >> possible way of bypassing the core code by the CP hook code execution
> > >> ?   Am +1.
> > >>
> > >> -Anoop-
> > >>
> > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <apurtell@apache.org
> >
> > >> wrote:
> > >> > The coprocessor API provides an environment method, bypass(), that
> > when
> > >> > called from a preXXX hook will cause the core code to skip all
> > remaining
> > >> > processing. This capability was introduced on HBASE-3348. Since this
> > >> time I
> > >> > think we are more enlightened about the complications of this
> feature.
> > >> (Or,
> > >> > anyway, speaking for myself:)
> > >> >
> > >> > Not all hooks provide the bypass semantic. Where this is the case
> the
> > >> > javadoc for the hook says so, but it can be missed. If you call
> > bypass()
> > >> in
> > >> > a hook where it is not supported it is a no-op. This can lead to a
> > poor
> > >> > developer experience.
> > >> >
> > >> > Where bypass is supported what is being bypassed is all of the core
> > code
> > >> > implementing the remainder of the operation. In order to understand
> > what
> > >> > calling bypass() will skip, a coprocessor implementer should read
> and
> > >> > understand all of the remaining code and its nuances. Although I
> think
> > >> this
> > >> > is good practice for coprocessor developers in general, it demands a
> > >> lot. I
> > >> > think it would provide a much better developer experience if we
> didn't
> > >> > allow bypass, even though it means - in theory - a coprocessor would
> > be a
> > >> > lot more limited in some ways than before. What is skipped is
> > extremely
> > >> > version dependent. That core code will vary, perhaps significantly,
> > even
> > >> > between point releases. We do not provide the promise of consistent
> > >> > behavior even between point releases for the bypass semantic. To
> > achieve
> > >> > that we could not change any code between hook points. Therefore the
> > >> > coprocessor implementer becomes an HBase core developer in practice
> as
> > >> soon
> > >> > as they rely on bypass(). Every release of HBase may break the
> > assumption
> > >> > that the replacement for the bypassed code takes care of all
> necessary
> > >> > skipped concerns. Because those concerns can change at any point,
> > such an
> > >> > assumption is never safe.
> > >> >
> > >> > I say "in theory" because I would be surprised if anyone is relying
> on
> > >> the
> > >> > bypass for the above reason. I seem to recall that Phoenix might use
> > it
> > >> in
> > >> > one place to promote a normal mutation into an atomic operation, by
> > >> > substituting one for the other, but if so that objective could be
> > >> > reimplemented using their new locking manager.
> > >> >
> > >> > --
> > >> > Best regards,
> > >> > Andrew
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> >
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

See HBASE-18747. I'm currently working on it. I think this can be done by
wrapping an InternalScanner.

2017-10-11 9:55 GMT+08:00 Anoop John <an...@gmail.com>:

> The discussion abt the pre hook return a not null object came in one
> of the CP cleanup issue.  We were discussing whether we should expose
> APIs for the StoreScanner object creation.  As long as this pre hook
> return way and so bypass the actual core code,  we need to expose.
> Duo raised this concern that time..  WDYT  Duo Zhang?
>
> -Anoop-
>
> On Wed, Oct 11, 2017 at 7:23 AM, Anoop John <an...@gmail.com> wrote:
> >>The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> to
> > This is interesting code to read.     There is a way for doing this
> > with out bypass.  Make use of the hook preBatchMutate().   This hook
> > takes MiniBatchOperationInProgress to which we can add some Mutations
> > generated from CPs.    From Deletes, the Puts can be created and added
> > ( #addOperationsFromCP) .   Also mark the corresponding Delete
> > mutations as done using #setOperationStatus(int index, OperationStatus
> > opStatus)  with OperationStatus#SUCCESS..    The core code part will
> > ignore these Delete mutations then and process those puts, been added
> > by CPs.
> >
> > Just a suggestion on how the op can be done with out using the bypass
> > way.   But the current way is looking simpler. The bypass was of help.
> >
> > -Anoop-
> >
> > On Wed, Oct 11, 2017 at 4:00 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> >>> The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> >> to override handling of delete tombstones in a transactional way
> >>
> >> Hmm. This is an interesting case. I can't think of how Deletes could be
> >> converted into Puts from this code, from when the handling of the
> Delete is
> >> already in progress, but it could be possible to add another hook
> somewhere
> >> in RPC dispatch ahead of when we have moved through RsRpcServices down
> into
> >> HRegion, and replace the incoming Delete with Puts there without a need
> to
> >> bypass.
> >>
> >>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp()
> and
> >> preIncrementAfterRRowLock() to provide a transaction implementation of
> >> readless increments
> >>
> >> This one might be possible to achieve using the wrapped scanner and a
> >> replacement of the Get object handed down into core code with something
> >> that is useless and harmless rather than a bypass.
> >>
> >> These are great examples.
> >>
> >>
> >> On Tue, Oct 10, 2017 at 12:19 PM, Gary Helmling <gh...@gmail.com>
> wrote:
> >>
> >>> The Tephra TransactionProcessor CP makes use of bypass() in
> preDelete() to
> >>> override handling of delete tombstones in a transactional way:
> >>> https://github.com/apache/incubator-tephra/blob/master/
> >>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> hbase/coprocessor/
> >>> TransactionProcessor.java#L244
> >>>
> >>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp()
> and
> >>> preIncrementAfterRRowLock() to provide a transaction implementation of
> >>> readless increments:
> >>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> >>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> >>> hbase11/IncrementHandler.java#L121
> >>>
> >>> What would be the alternate approach for these applications?  In both
> cases
> >>> they need to impose their own semantics on the underlying KeyValue
> >>> storage.  Is there a different way this can be done?
> >>>
> >>>
> >>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> wrote:
> >>>
> >>> > Wrap core scanners is different right?  That can be done in post
> >>> > hooks.  I have seen many use cases for this..  Its the question abt
> >>> > the pre hooks where we have not yet created the core object (like
> >>> > scanner).  The CP pre code itself doing the work of object creation
> >>> > and so the core code is been bypassed.    Well the wrapping thing can
> >>> > be done in pre hook also. First create the core object by CP code
> >>> > itself and then do the wrapped object and return.. I have seen in one
> >>> > jira issue where the usage was this way..   The wrapping can be done
> >>> > in post also in such cases I believe.
> >>> >
> >>> > -Anoop-
> >>> >
> >>> > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> apurtell@apache.org>
> >>> > wrote:
> >>> > > I think we should continue to support overriding function by object
> >>> > > inheritance. I didn't mention this and am not proposing more than
> >>> > removing
> >>> > > the bypass() sematic. No more no less. Phoenix absolutely depends
> on
> >>> > being
> >>> > > able to wrap core scanners and return the wrappers.
> >>> > >
> >>> > >
> >>> > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> anoop.hbase@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > >> When we say bypass the core code, it can be done today not only by
> >>> > >> calling bypass but by returning a not null object for some of the
> pre
> >>> > >> hooks.  Like preScannerOpen() if it return a scanner object, we
> will
> >>> > >> avoid the remaining core code execution for creation of the
> >>> > >> scanner(s).  So this proposal include this aspect also and remove
> any
> >>> > >> possible way of bypassing the core code by the CP hook code
> execution
> >>> > >> ?   Am +1.
> >>> > >>
> >>> > >> -Anoop-
> >>> > >>
> >>> > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> apurtell@apache.org
> >>> >
> >>> > >> wrote:
> >>> > >> > The coprocessor API provides an environment method, bypass(),
> that
> >>> > when
> >>> > >> > called from a preXXX hook will cause the core code to skip all
> >>> > remaining
> >>> > >> > processing. This capability was introduced on HBASE-3348. Since
> this
> >>> > >> time I
> >>> > >> > think we are more enlightened about the complications of this
> >>> feature.
> >>> > >> (Or,
> >>> > >> > anyway, speaking for myself:)
> >>> > >> >
> >>> > >> > Not all hooks provide the bypass semantic. Where this is the
> case
> >>> the
> >>> > >> > javadoc for the hook says so, but it can be missed. If you call
> >>> > bypass()
> >>> > >> in
> >>> > >> > a hook where it is not supported it is a no-op. This can lead
> to a
> >>> > poor
> >>> > >> > developer experience.
> >>> > >> >
> >>> > >> > Where bypass is supported what is being bypassed is all of the
> core
> >>> > code
> >>> > >> > implementing the remainder of the operation. In order to
> understand
> >>> > what
> >>> > >> > calling bypass() will skip, a coprocessor implementer should
> read
> >>> and
> >>> > >> > understand all of the remaining code and its nuances. Although I
> >>> think
> >>> > >> this
> >>> > >> > is good practice for coprocessor developers in general, it
> demands a
> >>> > >> lot. I
> >>> > >> > think it would provide a much better developer experience if we
> >>> didn't
> >>> > >> > allow bypass, even though it means - in theory - a coprocessor
> would
> >>> > be a
> >>> > >> > lot more limited in some ways than before. What is skipped is
> >>> > extremely
> >>> > >> > version dependent. That core code will vary, perhaps
> significantly,
> >>> > even
> >>> > >> > between point releases. We do not provide the promise of
> consistent
> >>> > >> > behavior even between point releases for the bypass semantic. To
> >>> > achieve
> >>> > >> > that we could not change any code between hook points.
> Therefore the
> >>> > >> > coprocessor implementer becomes an HBase core developer in
> practice
> >>> as
> >>> > >> soon
> >>> > >> > as they rely on bypass(). Every release of HBase may break the
> >>> > assumption
> >>> > >> > that the replacement for the bypassed code takes care of all
> >>> necessary
> >>> > >> > skipped concerns. Because those concerns can change at any
> point,
> >>> > such an
> >>> > >> > assumption is never safe.
> >>> > >> >
> >>> > >> > I say "in theory" because I would be surprised if anyone is
> relying
> >>> on
> >>> > >> the
> >>> > >> > bypass for the above reason. I seem to recall that Phoenix
> might use
> >>> > it
> >>> > >> in
> >>> > >> > one place to promote a normal mutation into an atomic
> operation, by
> >>> > >> > substituting one for the other, but if so that objective could
> be
> >>> > >> > reimplemented using their new locking manager.
> >>> > >> >
> >>> > >> > --
> >>> > >> > Best regards,
> >>> > >> > Andrew
> >>> > >>
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Best regards,
> >>> > > Andrew
> >>> > >
> >>> > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> >>> > > decrepit hands
> >>> > >    - A23, Crosstalk
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew
> >>
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Anoop John <an...@gmail.com>.

The discussion abt the pre hook return a not null object came in one
of the CP cleanup issue.  We were discussing whether we should expose
APIs for the StoreScanner object creation.  As long as this pre hook
return way and so bypass the actual core code,  we need to expose.
Duo raised this concern that time..  WDYT  Duo Zhang?

-Anoop-

On Wed, Oct 11, 2017 at 7:23 AM, Anoop John <an...@gmail.com> wrote:
>>The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
> This is interesting code to read.     There is a way for doing this
> with out bypass.  Make use of the hook preBatchMutate().   This hook
> takes MiniBatchOperationInProgress to which we can add some Mutations
> generated from CPs.    From Deletes, the Puts can be created and added
> ( #addOperationsFromCP) .   Also mark the corresponding Delete
> mutations as done using #setOperationStatus(int index, OperationStatus
> opStatus)  with OperationStatus#SUCCESS..    The core code part will
> ignore these Delete mutations then and process those puts, been added
> by CPs.
>
> Just a suggestion on how the op can be done with out using the bypass
> way.   But the current way is looking simpler. The bypass was of help.
>
> -Anoop-
>
> On Wed, Oct 11, 2017 at 4:00 AM, Andrew Purtell <ap...@apache.org> wrote:
>>> The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
>> to override handling of delete tombstones in a transactional way
>>
>> Hmm. This is an interesting case. I can't think of how Deletes could be
>> converted into Puts from this code, from when the handling of the Delete is
>> already in progress, but it could be possible to add another hook somewhere
>> in RPC dispatch ahead of when we have moved through RsRpcServices down into
>> HRegion, and replace the incoming Delete with Puts there without a need to
>> bypass.
>>
>>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
>> preIncrementAfterRRowLock() to provide a transaction implementation of
>> readless increments
>>
>> This one might be possible to achieve using the wrapped scanner and a
>> replacement of the Get object handed down into core code with something
>> that is useless and harmless rather than a bypass.
>>
>> These are great examples.
>>
>>
>> On Tue, Oct 10, 2017 at 12:19 PM, Gary Helmling <gh...@gmail.com> wrote:
>>
>>> The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
>>> override handling of delete tombstones in a transactional way:
>>> https://github.com/apache/incubator-tephra/blob/master/
>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/hbase/coprocessor/
>>> TransactionProcessor.java#L244
>>>
>>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
>>> preIncrementAfterRRowLock() to provide a transaction implementation of
>>> readless increments:
>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>> hbase11/IncrementHandler.java#L121
>>>
>>> What would be the alternate approach for these applications?  In both cases
>>> they need to impose their own semantics on the underlying KeyValue
>>> storage.  Is there a different way this can be done?
>>>
>>>
>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com> wrote:
>>>
>>> > Wrap core scanners is different right?  That can be done in post
>>> > hooks.  I have seen many use cases for this..  Its the question abt
>>> > the pre hooks where we have not yet created the core object (like
>>> > scanner).  The CP pre code itself doing the work of object creation
>>> > and so the core code is been bypassed.    Well the wrapping thing can
>>> > be done in pre hook also. First create the core object by CP code
>>> > itself and then do the wrapped object and return.. I have seen in one
>>> > jira issue where the usage was this way..   The wrapping can be done
>>> > in post also in such cases I believe.
>>> >
>>> > -Anoop-
>>> >
>>> > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
>>> > wrote:
>>> > > I think we should continue to support overriding function by object
>>> > > inheritance. I didn't mention this and am not proposing more than
>>> > removing
>>> > > the bypass() sematic. No more no less. Phoenix absolutely depends on
>>> > being
>>> > > able to wrap core scanners and return the wrappers.
>>> > >
>>> > >
>>> > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
>>> > wrote:
>>> > >
>>> > >> When we say bypass the core code, it can be done today not only by
>>> > >> calling bypass but by returning a not null object for some of the pre
>>> > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
>>> > >> avoid the remaining core code execution for creation of the
>>> > >> scanner(s).  So this proposal include this aspect also and remove any
>>> > >> possible way of bypassing the core code by the CP hook code execution
>>> > >> ?   Am +1.
>>> > >>
>>> > >> -Anoop-
>>> > >>
>>> > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <apurtell@apache.org
>>> >
>>> > >> wrote:
>>> > >> > The coprocessor API provides an environment method, bypass(), that
>>> > when
>>> > >> > called from a preXXX hook will cause the core code to skip all
>>> > remaining
>>> > >> > processing. This capability was introduced on HBASE-3348. Since this
>>> > >> time I
>>> > >> > think we are more enlightened about the complications of this
>>> feature.
>>> > >> (Or,
>>> > >> > anyway, speaking for myself:)
>>> > >> >
>>> > >> > Not all hooks provide the bypass semantic. Where this is the case
>>> the
>>> > >> > javadoc for the hook says so, but it can be missed. If you call
>>> > bypass()
>>> > >> in
>>> > >> > a hook where it is not supported it is a no-op. This can lead to a
>>> > poor
>>> > >> > developer experience.
>>> > >> >
>>> > >> > Where bypass is supported what is being bypassed is all of the core
>>> > code
>>> > >> > implementing the remainder of the operation. In order to understand
>>> > what
>>> > >> > calling bypass() will skip, a coprocessor implementer should read
>>> and
>>> > >> > understand all of the remaining code and its nuances. Although I
>>> think
>>> > >> this
>>> > >> > is good practice for coprocessor developers in general, it demands a
>>> > >> lot. I
>>> > >> > think it would provide a much better developer experience if we
>>> didn't
>>> > >> > allow bypass, even though it means - in theory - a coprocessor would
>>> > be a
>>> > >> > lot more limited in some ways than before. What is skipped is
>>> > extremely
>>> > >> > version dependent. That core code will vary, perhaps significantly,
>>> > even
>>> > >> > between point releases. We do not provide the promise of consistent
>>> > >> > behavior even between point releases for the bypass semantic. To
>>> > achieve
>>> > >> > that we could not change any code between hook points. Therefore the
>>> > >> > coprocessor implementer becomes an HBase core developer in practice
>>> as
>>> > >> soon
>>> > >> > as they rely on bypass(). Every release of HBase may break the
>>> > assumption
>>> > >> > that the replacement for the bypassed code takes care of all
>>> necessary
>>> > >> > skipped concerns. Because those concerns can change at any point,
>>> > such an
>>> > >> > assumption is never safe.
>>> > >> >
>>> > >> > I say "in theory" because I would be surprised if anyone is relying
>>> on
>>> > >> the
>>> > >> > bypass for the above reason. I seem to recall that Phoenix might use
>>> > it
>>> > >> in
>>> > >> > one place to promote a normal mutation into an atomic operation, by
>>> > >> > substituting one for the other, but if so that objective could be
>>> > >> > reimplemented using their new locking manager.
>>> > >> >
>>> > >> > --
>>> > >> > Best regards,
>>> > >> > Andrew
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best regards,
>>> > > Andrew
>>> > >
>>> > > Words like orphans lost among the crosstalk, meaning torn from truth's
>>> > > decrepit hands
>>> > >    - A23, Crosstalk
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrew
>>
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>    - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Anoop John <an...@gmail.com>.

>The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
This is interesting code to read.     There is a way for doing this
with out bypass.  Make use of the hook preBatchMutate().   This hook
takes MiniBatchOperationInProgress to which we can add some Mutations
generated from CPs.    From Deletes, the Puts can be created and added
( #addOperationsFromCP) .   Also mark the corresponding Delete
mutations as done using #setOperationStatus(int index, OperationStatus
opStatus)  with OperationStatus#SUCCESS..    The core code part will
ignore these Delete mutations then and process those puts, been added
by CPs.

Just a suggestion on how the op can be done with out using the bypass
way.   But the current way is looking simpler. The bypass was of help.

-Anoop-

On Wed, Oct 11, 2017 at 4:00 AM, Andrew Purtell <ap...@apache.org> wrote:
>> The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> to override handling of delete tombstones in a transactional way
>
> Hmm. This is an interesting case. I can't think of how Deletes could be
> converted into Puts from this code, from when the handling of the Delete is
> already in progress, but it could be possible to add another hook somewhere
> in RPC dispatch ahead of when we have moved through RsRpcServices down into
> HRegion, and replace the incoming Delete with Puts there without a need to
> bypass.
>
>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> preIncrementAfterRRowLock() to provide a transaction implementation of
> readless increments
>
> This one might be possible to achieve using the wrapped scanner and a
> replacement of the Get object handed down into core code with something
> that is useless and harmless rather than a bypass.
>
> These are great examples.
>
>
> On Tue, Oct 10, 2017 at 12:19 PM, Gary Helmling <gh...@gmail.com> wrote:
>
>> The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
>> override handling of delete tombstones in a transactional way:
>> https://github.com/apache/incubator-tephra/blob/master/
>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/hbase/coprocessor/
>> TransactionProcessor.java#L244
>>
>> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
>> preIncrementAfterRRowLock() to provide a transaction implementation of
>> readless increments:
>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>> hbase11/IncrementHandler.java#L121
>>
>> What would be the alternate approach for these applications?  In both cases
>> they need to impose their own semantics on the underlying KeyValue
>> storage.  Is there a different way this can be done?
>>
>>
>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com> wrote:
>>
>> > Wrap core scanners is different right?  That can be done in post
>> > hooks.  I have seen many use cases for this..  Its the question abt
>> > the pre hooks where we have not yet created the core object (like
>> > scanner).  The CP pre code itself doing the work of object creation
>> > and so the core code is been bypassed.    Well the wrapping thing can
>> > be done in pre hook also. First create the core object by CP code
>> > itself and then do the wrapped object and return.. I have seen in one
>> > jira issue where the usage was this way..   The wrapping can be done
>> > in post also in such cases I believe.
>> >
>> > -Anoop-
>> >
>> > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
>> > wrote:
>> > > I think we should continue to support overriding function by object
>> > > inheritance. I didn't mention this and am not proposing more than
>> > removing
>> > > the bypass() sematic. No more no less. Phoenix absolutely depends on
>> > being
>> > > able to wrap core scanners and return the wrappers.
>> > >
>> > >
>> > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
>> > wrote:
>> > >
>> > >> When we say bypass the core code, it can be done today not only by
>> > >> calling bypass but by returning a not null object for some of the pre
>> > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
>> > >> avoid the remaining core code execution for creation of the
>> > >> scanner(s).  So this proposal include this aspect also and remove any
>> > >> possible way of bypassing the core code by the CP hook code execution
>> > >> ?   Am +1.
>> > >>
>> > >> -Anoop-
>> > >>
>> > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <apurtell@apache.org
>> >
>> > >> wrote:
>> > >> > The coprocessor API provides an environment method, bypass(), that
>> > when
>> > >> > called from a preXXX hook will cause the core code to skip all
>> > remaining
>> > >> > processing. This capability was introduced on HBASE-3348. Since this
>> > >> time I
>> > >> > think we are more enlightened about the complications of this
>> feature.
>> > >> (Or,
>> > >> > anyway, speaking for myself:)
>> > >> >
>> > >> > Not all hooks provide the bypass semantic. Where this is the case
>> the
>> > >> > javadoc for the hook says so, but it can be missed. If you call
>> > bypass()
>> > >> in
>> > >> > a hook where it is not supported it is a no-op. This can lead to a
>> > poor
>> > >> > developer experience.
>> > >> >
>> > >> > Where bypass is supported what is being bypassed is all of the core
>> > code
>> > >> > implementing the remainder of the operation. In order to understand
>> > what
>> > >> > calling bypass() will skip, a coprocessor implementer should read
>> and
>> > >> > understand all of the remaining code and its nuances. Although I
>> think
>> > >> this
>> > >> > is good practice for coprocessor developers in general, it demands a
>> > >> lot. I
>> > >> > think it would provide a much better developer experience if we
>> didn't
>> > >> > allow bypass, even though it means - in theory - a coprocessor would
>> > be a
>> > >> > lot more limited in some ways than before. What is skipped is
>> > extremely
>> > >> > version dependent. That core code will vary, perhaps significantly,
>> > even
>> > >> > between point releases. We do not provide the promise of consistent
>> > >> > behavior even between point releases for the bypass semantic. To
>> > achieve
>> > >> > that we could not change any code between hook points. Therefore the
>> > >> > coprocessor implementer becomes an HBase core developer in practice
>> as
>> > >> soon
>> > >> > as they rely on bypass(). Every release of HBase may break the
>> > assumption
>> > >> > that the replacement for the bypassed code takes care of all
>> necessary
>> > >> > skipped concerns. Because those concerns can change at any point,
>> > such an
>> > >> > assumption is never safe.
>> > >> >
>> > >> > I say "in theory" because I would be surprised if anyone is relying
>> on
>> > >> the
>> > >> > bypass for the above reason. I seem to recall that Phoenix might use
>> > it
>> > >> in
>> > >> > one place to promote a normal mutation into an atomic operation, by
>> > >> > substituting one for the other, but if so that objective could be
>> > >> > reimplemented using their new locking manager.
>> > >> >
>> > >> > --
>> > >> > Best regards,
>> > >> > Andrew
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > > Andrew
>> > >
>> > > Words like orphans lost among the crosstalk, meaning torn from truth's
>> > > decrepit hands
>> > >    - A23, Crosstalk
>> >
>>
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

I'm +1 on remove the general bypass support, but I think we can still
provide the bypass logic for some operations if we can give a clear and
stable semantic of what will be bypassed. Need to know some usages and
review the methods case by case.

Thanks.

2017-10-11 6:30 GMT+08:00 Andrew Purtell <ap...@apache.org>:

> > The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> to override handling of delete tombstones in a transactional way
>
> Hmm. This is an interesting case. I can't think of how Deletes could be
> converted into Puts from this code, from when the handling of the Delete is
> already in progress, but it could be possible to add another hook somewhere
> in RPC dispatch ahead of when we have moved through RsRpcServices down into
> HRegion, and replace the incoming Delete with Puts there without a need to
> bypass.
>
> > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> preIncrementAfterRRowLock() to provide a transaction implementation of
> readless increments
>
> This one might be possible to achieve using the wrapped scanner and a
> replacement of the Get object handed down into core code with something
> that is useless and harmless rather than a bypass.
>
> These are great examples.
>
>
> On Tue, Oct 10, 2017 at 12:19 PM, Gary Helmling <gh...@gmail.com>
> wrote:
>
> > The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
> to
> > override handling of delete tombstones in a transactional way:
> > https://github.com/apache/incubator-tephra/blob/master/
> > tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> hbase/coprocessor/
> > TransactionProcessor.java#L244
> >
> > The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> > preIncrementAfterRRowLock() to provide a transaction implementation of
> > readless increments:
> > https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> > compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> > hbase11/IncrementHandler.java#L121
> >
> > What would be the alternate approach for these applications?  In both
> cases
> > they need to impose their own semantics on the underlying KeyValue
> > storage.  Is there a different way this can be done?
> >
> >
> > On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com>
> wrote:
> >
> > > Wrap core scanners is different right?  That can be done in post
> > > hooks.  I have seen many use cases for this..  Its the question abt
> > > the pre hooks where we have not yet created the core object (like
> > > scanner).  The CP pre code itself doing the work of object creation
> > > and so the core code is been bypassed.    Well the wrapping thing can
> > > be done in pre hook also. First create the core object by CP code
> > > itself and then do the wrapped object and return.. I have seen in one
> > > jira issue where the usage was this way..   The wrapping can be done
> > > in post also in such cases I believe.
> > >
> > > -Anoop-
> > >
> > > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > > > I think we should continue to support overriding function by object
> > > > inheritance. I didn't mention this and am not proposing more than
> > > removing
> > > > the bypass() sematic. No more no less. Phoenix absolutely depends on
> > > being
> > > > able to wrap core scanners and return the wrappers.
> > > >
> > > >
> > > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
> > > wrote:
> > > >
> > > >> When we say bypass the core code, it can be done today not only by
> > > >> calling bypass but by returning a not null object for some of the
> pre
> > > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
> > > >> avoid the remaining core code execution for creation of the
> > > >> scanner(s).  So this proposal include this aspect also and remove
> any
> > > >> possible way of bypassing the core code by the CP hook code
> execution
> > > >> ?   Am +1.
> > > >>
> > > >> -Anoop-
> > > >>
> > > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <
> apurtell@apache.org
> > >
> > > >> wrote:
> > > >> > The coprocessor API provides an environment method, bypass(), that
> > > when
> > > >> > called from a preXXX hook will cause the core code to skip all
> > > remaining
> > > >> > processing. This capability was introduced on HBASE-3348. Since
> this
> > > >> time I
> > > >> > think we are more enlightened about the complications of this
> > feature.
> > > >> (Or,
> > > >> > anyway, speaking for myself:)
> > > >> >
> > > >> > Not all hooks provide the bypass semantic. Where this is the case
> > the
> > > >> > javadoc for the hook says so, but it can be missed. If you call
> > > bypass()
> > > >> in
> > > >> > a hook where it is not supported it is a no-op. This can lead to a
> > > poor
> > > >> > developer experience.
> > > >> >
> > > >> > Where bypass is supported what is being bypassed is all of the
> core
> > > code
> > > >> > implementing the remainder of the operation. In order to
> understand
> > > what
> > > >> > calling bypass() will skip, a coprocessor implementer should read
> > and
> > > >> > understand all of the remaining code and its nuances. Although I
> > think
> > > >> this
> > > >> > is good practice for coprocessor developers in general, it
> demands a
> > > >> lot. I
> > > >> > think it would provide a much better developer experience if we
> > didn't
> > > >> > allow bypass, even though it means - in theory - a coprocessor
> would
> > > be a
> > > >> > lot more limited in some ways than before. What is skipped is
> > > extremely
> > > >> > version dependent. That core code will vary, perhaps
> significantly,
> > > even
> > > >> > between point releases. We do not provide the promise of
> consistent
> > > >> > behavior even between point releases for the bypass semantic. To
> > > achieve
> > > >> > that we could not change any code between hook points. Therefore
> the
> > > >> > coprocessor implementer becomes an HBase core developer in
> practice
> > as
> > > >> soon
> > > >> > as they rely on bypass(). Every release of HBase may break the
> > > assumption
> > > >> > that the replacement for the bypassed code takes care of all
> > necessary
> > > >> > skipped concerns. Because those concerns can change at any point,
> > > such an
> > > >> > assumption is never safe.
> > > >> >
> > > >> > I say "in theory" because I would be surprised if anyone is
> relying
> > on
> > > >> the
> > > >> > bypass for the above reason. I seem to recall that Phoenix might
> use
> > > it
> > > >> in
> > > >> > one place to promote a normal mutation into an atomic operation,
> by
> > > >> > substituting one for the other, but if so that objective could be
> > > >> > reimplemented using their new locking manager.
> > > >> >
> > > >> > --
> > > >> > Best regards,
> > > >> > Andrew
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >    - A23, Crosstalk
> > >
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

> The Tephra TransactionProcessor CP makes use of bypass() in preDelete()
to override handling of delete tombstones in a transactional way

Hmm. This is an interesting case. I can't think of how Deletes could be
converted into Puts from this code, from when the handling of the Delete is
already in progress, but it could be possible to add another hook somewhere
in RPC dispatch ahead of when we have moved through RsRpcServices down into
HRegion, and replace the incoming Delete with Puts there without a need to
bypass.

> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
preIncrementAfterRRowLock() to provide a transaction implementation of
readless increments

This one might be possible to achieve using the wrapped scanner and a
replacement of the Get object handed down into core code with something
that is useless and harmless rather than a bypass.

These are great examples.


On Tue, Oct 10, 2017 at 12:19 PM, Gary Helmling <gh...@gmail.com> wrote:

> The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
> override handling of delete tombstones in a transactional way:
> https://github.com/apache/incubator-tephra/blob/master/
> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/hbase/coprocessor/
> TransactionProcessor.java#L244
>
> The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
> preIncrementAfterRRowLock() to provide a transaction implementation of
> readless increments:
> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> hbase11/IncrementHandler.java#L121
>
> What would be the alternate approach for these applications?  In both cases
> they need to impose their own semantics on the underlying KeyValue
> storage.  Is there a different way this can be done?
>
>
> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com> wrote:
>
> > Wrap core scanners is different right?  That can be done in post
> > hooks.  I have seen many use cases for this..  Its the question abt
> > the pre hooks where we have not yet created the core object (like
> > scanner).  The CP pre code itself doing the work of object creation
> > and so the core code is been bypassed.    Well the wrapping thing can
> > be done in pre hook also. First create the core object by CP code
> > itself and then do the wrapped object and return.. I have seen in one
> > jira issue where the usage was this way..   The wrapping can be done
> > in post also in such cases I believe.
> >
> > -Anoop-
> >
> > On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > > I think we should continue to support overriding function by object
> > > inheritance. I didn't mention this and am not proposing more than
> > removing
> > > the bypass() sematic. No more no less. Phoenix absolutely depends on
> > being
> > > able to wrap core scanners and return the wrappers.
> > >
> > >
> > > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
> > wrote:
> > >
> > >> When we say bypass the core code, it can be done today not only by
> > >> calling bypass but by returning a not null object for some of the pre
> > >> hooks.  Like preScannerOpen() if it return a scanner object, we will
> > >> avoid the remaining core code execution for creation of the
> > >> scanner(s).  So this proposal include this aspect also and remove any
> > >> possible way of bypassing the core code by the CP hook code execution
> > >> ?   Am +1.
> > >>
> > >> -Anoop-
> > >>
> > >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <apurtell@apache.org
> >
> > >> wrote:
> > >> > The coprocessor API provides an environment method, bypass(), that
> > when
> > >> > called from a preXXX hook will cause the core code to skip all
> > remaining
> > >> > processing. This capability was introduced on HBASE-3348. Since this
> > >> time I
> > >> > think we are more enlightened about the complications of this
> feature.
> > >> (Or,
> > >> > anyway, speaking for myself:)
> > >> >
> > >> > Not all hooks provide the bypass semantic. Where this is the case
> the
> > >> > javadoc for the hook says so, but it can be missed. If you call
> > bypass()
> > >> in
> > >> > a hook where it is not supported it is a no-op. This can lead to a
> > poor
> > >> > developer experience.
> > >> >
> > >> > Where bypass is supported what is being bypassed is all of the core
> > code
> > >> > implementing the remainder of the operation. In order to understand
> > what
> > >> > calling bypass() will skip, a coprocessor implementer should read
> and
> > >> > understand all of the remaining code and its nuances. Although I
> think
> > >> this
> > >> > is good practice for coprocessor developers in general, it demands a
> > >> lot. I
> > >> > think it would provide a much better developer experience if we
> didn't
> > >> > allow bypass, even though it means - in theory - a coprocessor would
> > be a
> > >> > lot more limited in some ways than before. What is skipped is
> > extremely
> > >> > version dependent. That core code will vary, perhaps significantly,
> > even
> > >> > between point releases. We do not provide the promise of consistent
> > >> > behavior even between point releases for the bypass semantic. To
> > achieve
> > >> > that we could not change any code between hook points. Therefore the
> > >> > coprocessor implementer becomes an HBase core developer in practice
> as
> > >> soon
> > >> > as they rely on bypass(). Every release of HBase may break the
> > assumption
> > >> > that the replacement for the bypassed code takes care of all
> necessary
> > >> > skipped concerns. Because those concerns can change at any point,
> > such an
> > >> > assumption is never safe.
> > >> >
> > >> > I say "in theory" because I would be surprised if anyone is relying
> on
> > >> the
> > >> > bypass for the above reason. I seem to recall that Phoenix might use
> > it
> > >> in
> > >> > one place to promote a normal mutation into an atomic operation, by
> > >> > substituting one for the other, but if so that objective could be
> > >> > reimplemented using their new locking manager.
> > >> >
> > >> > --
> > >> > Best regards,
> > >> > Andrew
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Gary Helmling <gh...@gmail.com>.

The Tephra TransactionProcessor CP makes use of bypass() in preDelete() to
override handling of delete tombstones in a transactional way:
https://github.com/apache/incubator-tephra/blob/master/tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/hbase/coprocessor/TransactionProcessor.java#L244

The CDAP IncrementHandler CP also makes use of bypass() in preGetOp() and
preIncrementAfterRRowLock() to provide a transaction implementation of
readless increments:
https://github.com/caskdata/cdap/blob/develop/cdap-hbase-compat-1.1/src/main/java/co/cask/cdap/data2/increment/hbase11/IncrementHandler.java#L121

What would be the alternate approach for these applications?  In both cases
they need to impose their own semantics on the underlying KeyValue
storage.  Is there a different way this can be done?


On Tue, Oct 10, 2017 at 11:58 AM Anoop John <an...@gmail.com> wrote:

> Wrap core scanners is different right?  That can be done in post
> hooks.  I have seen many use cases for this..  Its the question abt
> the pre hooks where we have not yet created the core object (like
> scanner).  The CP pre code itself doing the work of object creation
> and so the core code is been bypassed.    Well the wrapping thing can
> be done in pre hook also. First create the core object by CP code
> itself and then do the wrapped object and return.. I have seen in one
> jira issue where the usage was this way..   The wrapping can be done
> in post also in such cases I believe.
>
> -Anoop-
>
> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> > I think we should continue to support overriding function by object
> > inheritance. I didn't mention this and am not proposing more than
> removing
> > the bypass() sematic. No more no less. Phoenix absolutely depends on
> being
> > able to wrap core scanners and return the wrappers.
> >
> >
> > On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com>
> wrote:
> >
> >> When we say bypass the core code, it can be done today not only by
> >> calling bypass but by returning a not null object for some of the pre
> >> hooks.  Like preScannerOpen() if it return a scanner object, we will
> >> avoid the remaining core code execution for creation of the
> >> scanner(s).  So this proposal include this aspect also and remove any
> >> possible way of bypassing the core code by the CP hook code execution
> >> ?   Am +1.
> >>
> >> -Anoop-
> >>
> >> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >> > The coprocessor API provides an environment method, bypass(), that
> when
> >> > called from a preXXX hook will cause the core code to skip all
> remaining
> >> > processing. This capability was introduced on HBASE-3348. Since this
> >> time I
> >> > think we are more enlightened about the complications of this feature.
> >> (Or,
> >> > anyway, speaking for myself:)
> >> >
> >> > Not all hooks provide the bypass semantic. Where this is the case the
> >> > javadoc for the hook says so, but it can be missed. If you call
> bypass()
> >> in
> >> > a hook where it is not supported it is a no-op. This can lead to a
> poor
> >> > developer experience.
> >> >
> >> > Where bypass is supported what is being bypassed is all of the core
> code
> >> > implementing the remainder of the operation. In order to understand
> what
> >> > calling bypass() will skip, a coprocessor implementer should read and
> >> > understand all of the remaining code and its nuances. Although I think
> >> this
> >> > is good practice for coprocessor developers in general, it demands a
> >> lot. I
> >> > think it would provide a much better developer experience if we didn't
> >> > allow bypass, even though it means - in theory - a coprocessor would
> be a
> >> > lot more limited in some ways than before. What is skipped is
> extremely
> >> > version dependent. That core code will vary, perhaps significantly,
> even
> >> > between point releases. We do not provide the promise of consistent
> >> > behavior even between point releases for the bypass semantic. To
> achieve
> >> > that we could not change any code between hook points. Therefore the
> >> > coprocessor implementer becomes an HBase core developer in practice as
> >> soon
> >> > as they rely on bypass(). Every release of HBase may break the
> assumption
> >> > that the replacement for the bypassed code takes care of all necessary
> >> > skipped concerns. Because those concerns can change at any point,
> such an
> >> > assumption is never safe.
> >> >
> >> > I say "in theory" because I would be surprised if anyone is relying on
> >> the
> >> > bypass for the above reason. I seem to recall that Phoenix might use
> it
> >> in
> >> > one place to promote a normal mutation into an atomic operation, by
> >> > substituting one for the other, but if so that objective could be
> >> > reimplemented using their new locking manager.
> >> >
> >> > --
> >> > Best regards,
> >> > Andrew
> >>
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Anoop John <an...@gmail.com>.

Wrap core scanners is different right?  That can be done in post
hooks.  I have seen many use cases for this..  Its the question abt
the pre hooks where we have not yet created the core object (like
scanner).  The CP pre code itself doing the work of object creation
and so the core code is been bypassed.    Well the wrapping thing can
be done in pre hook also. First create the core object by CP code
itself and then do the wrapped object and return.. I have seen in one
jira issue where the usage was this way..   The wrapping can be done
in post also in such cases I believe.

-Anoop-

On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <ap...@apache.org> wrote:
> I think we should continue to support overriding function by object
> inheritance. I didn't mention this and am not proposing more than removing
> the bypass() sematic. No more no less. Phoenix absolutely depends on being
> able to wrap core scanners and return the wrappers.
>
>
> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com> wrote:
>
>> When we say bypass the core code, it can be done today not only by
>> calling bypass but by returning a not null object for some of the pre
>> hooks.  Like preScannerOpen() if it return a scanner object, we will
>> avoid the remaining core code execution for creation of the
>> scanner(s).  So this proposal include this aspect also and remove any
>> possible way of bypassing the core code by the CP hook code execution
>> ?   Am +1.
>>
>> -Anoop-
>>
>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> > The coprocessor API provides an environment method, bypass(), that when
>> > called from a preXXX hook will cause the core code to skip all remaining
>> > processing. This capability was introduced on HBASE-3348. Since this
>> time I
>> > think we are more enlightened about the complications of this feature.
>> (Or,
>> > anyway, speaking for myself:)
>> >
>> > Not all hooks provide the bypass semantic. Where this is the case the
>> > javadoc for the hook says so, but it can be missed. If you call bypass()
>> in
>> > a hook where it is not supported it is a no-op. This can lead to a poor
>> > developer experience.
>> >
>> > Where bypass is supported what is being bypassed is all of the core code
>> > implementing the remainder of the operation. In order to understand what
>> > calling bypass() will skip, a coprocessor implementer should read and
>> > understand all of the remaining code and its nuances. Although I think
>> this
>> > is good practice for coprocessor developers in general, it demands a
>> lot. I
>> > think it would provide a much better developer experience if we didn't
>> > allow bypass, even though it means - in theory - a coprocessor would be a
>> > lot more limited in some ways than before. What is skipped is extremely
>> > version dependent. That core code will vary, perhaps significantly, even
>> > between point releases. We do not provide the promise of consistent
>> > behavior even between point releases for the bypass semantic. To achieve
>> > that we could not change any code between hook points. Therefore the
>> > coprocessor implementer becomes an HBase core developer in practice as
>> soon
>> > as they rely on bypass(). Every release of HBase may break the assumption
>> > that the replacement for the bypassed code takes care of all necessary
>> > skipped concerns. Because those concerns can change at any point, such an
>> > assumption is never safe.
>> >
>> > I say "in theory" because I would be surprised if anyone is relying on
>> the
>> > bypass for the above reason. I seem to recall that Phoenix might use it
>> in
>> > one place to promote a normal mutation into an atomic operation, by
>> > substituting one for the other, but if so that objective could be
>> > reimplemented using their new locking manager.
>> >
>> > --
>> > Best regards,
>> > Andrew
>>
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

I think we should continue to support overriding function by object
inheritance. I didn't mention this and am not proposing more than removing
the bypass() sematic. No more no less. Phoenix absolutely depends on being
able to wrap core scanners and return the wrappers.


On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <an...@gmail.com> wrote:

> When we say bypass the core code, it can be done today not only by
> calling bypass but by returning a not null object for some of the pre
> hooks.  Like preScannerOpen() if it return a scanner object, we will
> avoid the remaining core code execution for creation of the
> scanner(s).  So this proposal include this aspect also and remove any
> possible way of bypassing the core code by the CP hook code execution
> ?   Am +1.
>
> -Anoop-
>
> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> > The coprocessor API provides an environment method, bypass(), that when
> > called from a preXXX hook will cause the core code to skip all remaining
> > processing. This capability was introduced on HBASE-3348. Since this
> time I
> > think we are more enlightened about the complications of this feature.
> (Or,
> > anyway, speaking for myself:)
> >
> > Not all hooks provide the bypass semantic. Where this is the case the
> > javadoc for the hook says so, but it can be missed. If you call bypass()
> in
> > a hook where it is not supported it is a no-op. This can lead to a poor
> > developer experience.
> >
> > Where bypass is supported what is being bypassed is all of the core code
> > implementing the remainder of the operation. In order to understand what
> > calling bypass() will skip, a coprocessor implementer should read and
> > understand all of the remaining code and its nuances. Although I think
> this
> > is good practice for coprocessor developers in general, it demands a
> lot. I
> > think it would provide a much better developer experience if we didn't
> > allow bypass, even though it means - in theory - a coprocessor would be a
> > lot more limited in some ways than before. What is skipped is extremely
> > version dependent. That core code will vary, perhaps significantly, even
> > between point releases. We do not provide the promise of consistent
> > behavior even between point releases for the bypass semantic. To achieve
> > that we could not change any code between hook points. Therefore the
> > coprocessor implementer becomes an HBase core developer in practice as
> soon
> > as they rely on bypass(). Every release of HBase may break the assumption
> > that the replacement for the bypassed code takes care of all necessary
> > skipped concerns. Because those concerns can change at any point, such an
> > assumption is never safe.
> >
> > I say "in theory" because I would be surprised if anyone is relying on
> the
> > bypass for the above reason. I seem to recall that Phoenix might use it
> in
> > one place to promote a normal mutation into an atomic operation, by
> > substituting one for the other, but if so that objective could be
> > reimplemented using their new locking manager.
> >
> > --
> > Best regards,
> > Andrew
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Anoop John <an...@gmail.com>.

When we say bypass the core code, it can be done today not only by
calling bypass but by returning a not null object for some of the pre
hooks.  Like preScannerOpen() if it return a scanner object, we will
avoid the remaining core code execution for creation of the
scanner(s).  So this proposal include this aspect also and remove any
possible way of bypassing the core code by the CP hook code execution
?   Am +1.

-Anoop-

On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell <ap...@apache.org> wrote:
> The coprocessor API provides an environment method, bypass(), that when
> called from a preXXX hook will cause the core code to skip all remaining
> processing. This capability was introduced on HBASE-3348. Since this time I
> think we are more enlightened about the complications of this feature. (Or,
> anyway, speaking for myself:)
>
> Not all hooks provide the bypass semantic. Where this is the case the
> javadoc for the hook says so, but it can be missed. If you call bypass() in
> a hook where it is not supported it is a no-op. This can lead to a poor
> developer experience.
>
> Where bypass is supported what is being bypassed is all of the core code
> implementing the remainder of the operation. In order to understand what
> calling bypass() will skip, a coprocessor implementer should read and
> understand all of the remaining code and its nuances. Although I think this
> is good practice for coprocessor developers in general, it demands a lot. I
> think it would provide a much better developer experience if we didn't
> allow bypass, even though it means - in theory - a coprocessor would be a
> lot more limited in some ways than before. What is skipped is extremely
> version dependent. That core code will vary, perhaps significantly, even
> between point releases. We do not provide the promise of consistent
> behavior even between point releases for the bypass semantic. To achieve
> that we could not change any code between hook points. Therefore the
> coprocessor implementer becomes an HBase core developer in practice as soon
> as they rely on bypass(). Every release of HBase may break the assumption
> that the replacement for the bypassed code takes care of all necessary
> skipped concerns. Because those concerns can change at any point, such an
> assumption is never safe.
>
> I say "in theory" because I would be surprised if anyone is relying on the
> bypass for the above reason. I seem to recall that Phoenix might use it in
> one place to promote a normal mutation into an atomic operation, by
> substituting one for the other, but if so that objective could be
> reimplemented using their new locking manager.
>
> --
> Best regards,
> Andrew

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Andrew Purtell <ap...@apache.org>.

Agreed with the metrics. This is tied to the bypass consideration in my
opinion. Updating core metrics would be part of substituting core code via
bypass. Without bypass, it doesn't make sense for coprocessors to touch
core metrics, because they can only add functionality, so can export their
own metrics.


On Tue, Oct 10, 2017 at 11:49 AM, Stack <st...@duboce.net> wrote:

> On Tue, Oct 10, 2017 at 11:10 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > The coprocessor API provides an environment method, bypass(), that when
> > called from a preXXX hook will cause the core code to skip all remaining
> > processing. This capability was introduced on HBASE-3348. Since this
> time I
> > think we are more enlightened about the complications of this feature.
> (Or,
> > anyway, speaking for myself:)
> >
> > Not all hooks provide the bypass semantic. Where this is the case the
> > javadoc for the hook says so, but it can be missed. If you call bypass()
> in
> > a hook where it is not supported it is a no-op. This can lead to a poor
> > developer experience.
> >
> > Where bypass is supported what is being bypassed is all of the core code
> > implementing the remainder of the operation. In order to understand what
> > calling bypass() will skip, a coprocessor implementer should read and
> > understand all of the remaining code and its nuances. Although I think
> this
> > is good practice for coprocessor developers in general, it demands a
> lot. I
> > think it would provide a much better developer experience if we didn't
> > allow bypass, even though it means - in theory - a coprocessor would be a
> > lot more limited in some ways than before. What is skipped is extremely
> > version dependent. That core code will vary, perhaps significantly, even
> > between point releases. We do not provide the promise of consistent
> > behavior even between point releases for the bypass semantic. To achieve
> > that we could not change any code between hook points. Therefore the
> > coprocessor implementer becomes an HBase core developer in practice as
> soon
> > as they rely on bypass(). Every release of HBase may break the assumption
> > that the replacement for the bypassed code takes care of all necessary
> > skipped concerns. Because those concerns can change at any point, such an
> > assumption is never safe.
> >
> > I say "in theory" because I would be surprised if anyone is relying on
> the
> > bypass for the above reason. I seem to recall that Phoenix might use it
> in
> > one place to promote a normal mutation into an atomic operation, by
> > substituting one for the other, but if so that objective could be
> > reimplemented using their new locking manager.
> >
> >
>
> Thanks Andrew for starting the discussion.
>
> Up in JIRA we've also talked of a 'wonky' case where CPs need write-access
> to the internal Metrics system so a CP on by-pass is able to increment
> standard counters. For example, a CP might bypass a Get operation instead
> returning a Cell from a CP-managed cache. In this latter case, it might
> want to increment the Get metrics counters so the by-pass shows in the
> general Get counts.
>
> This is problematic but there is at least an instance happening downstream.
> We'd like to keep internal metrics internal. CPs should be able to publish
> their own metrics.
>
> Please speak up if you depend on by-pass.
>
> S
>
>
>
>
>
>
>
> > --
> > Best regards,
> > Andrew
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Probably bypass related hooks we should fix by understanding more use
cases. Should we have it only for the mutations and scanner creations? And
if not direct by pass way is there any other way we can provide for the CPs
to create their own behaviour.

Regards
Ram





On Wed, Oct 11, 2017 at 9:07 AM, Stack <st...@duboce.net> wrote:

> On Tue, Oct 10, 2017 at 3:12 PM, Stack <st...@duboce.net> wrote:
>
> > I've been splunking Phoenix code and I see that in SequenceRegionObserver
> > and in Indexer, they do by-pass doing their own version of Increment.
> >
> >
>
> Following in the previous vein of offering alternatives, Phoenix has its
> own Increment because of "...deficiencies in Increment implementation
> (HBASE-10254)":
>
>  1) Lack of recognition and identification of when the key value to
> increment doesn't exist
>  2) Lack of the ability to set the timestamp of the updated key value.
>
>
> Works the same as existing region.increment(), except assumes there is a
> single column to increment and uses Phoenix LONG encoding.
>
>
> I think #1 in above is at least improved and if not good enough, we could
> fix. On #2, could add an ability to search the Increment payload for a
> timestamp to use. Assuming a single column seems like a downgrade. Phoenix
> LONG encoding might be hard to do unless we want to pull-in phoenix libs.
>
>
> St.Ack
>
>
>
>
>
>
>
>
>
>
> > St.Ack
> >
> > On Tue, Oct 10, 2017 at 11:49 AM, Stack <st...@duboce.net> wrote:
> >
> >> On Tue, Oct 10, 2017 at 11:10 AM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>
> >>> The coprocessor API provides an environment method, bypass(), that when
> >>> called from a preXXX hook will cause the core code to skip all
> remaining
> >>> processing. This capability was introduced on HBASE-3348. Since this
> >>> time I
> >>> think we are more enlightened about the complications of this feature.
> >>> (Or,
> >>> anyway, speaking for myself:)
> >>>
> >>> Not all hooks provide the bypass semantic. Where this is the case the
> >>> javadoc for the hook says so, but it can be missed. If you call
> bypass()
> >>> in
> >>> a hook where it is not supported it is a no-op. This can lead to a poor
> >>> developer experience.
> >>>
> >>> Where bypass is supported what is being bypassed is all of the core
> code
> >>> implementing the remainder of the operation. In order to understand
> what
> >>> calling bypass() will skip, a coprocessor implementer should read and
> >>> understand all of the remaining code and its nuances. Although I think
> >>> this
> >>> is good practice for coprocessor developers in general, it demands a
> >>> lot. I
> >>> think it would provide a much better developer experience if we didn't
> >>> allow bypass, even though it means - in theory - a coprocessor would
> be a
> >>> lot more limited in some ways than before. What is skipped is extremely
> >>> version dependent. That core code will vary, perhaps significantly,
> even
> >>> between point releases. We do not provide the promise of consistent
> >>> behavior even between point releases for the bypass semantic. To
> achieve
> >>> that we could not change any code between hook points. Therefore the
> >>> coprocessor implementer becomes an HBase core developer in practice as
> >>> soon
> >>> as they rely on bypass(). Every release of HBase may break the
> assumption
> >>> that the replacement for the bypassed code takes care of all necessary
> >>> skipped concerns. Because those concerns can change at any point, such
> an
> >>> assumption is never safe.
> >>>
> >>> I say "in theory" because I would be surprised if anyone is relying on
> >>> the
> >>> bypass for the above reason. I seem to recall that Phoenix might use it
> >>> in
> >>> one place to promote a normal mutation into an atomic operation, by
> >>> substituting one for the other, but if so that objective could be
> >>> reimplemented using their new locking manager.
> >>>
> >>>
> >>
> >> Thanks Andrew for starting the discussion.
> >>
> >> Up in JIRA we've also talked of a 'wonky' case where CPs need
> >> write-access to the internal Metrics system so a CP on by-pass is able
> to
> >> increment standard counters. For example, a CP might bypass a Get
> operation
> >> instead returning a Cell from a CP-managed cache. In this latter case,
> it
> >> might want to increment the Get metrics counters so the by-pass shows in
> >> the general Get counts.
> >>
> >> This is problematic but there is at least an instance happening
> >> downstream. We'd like to keep internal metrics internal. CPs should be
> able
> >> to publish their own metrics.
> >>
> >> Please speak up if you depend on by-pass.
> >>
> >> S
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>> --
> >>> Best regards,
> >>> Andrew
> >>>
> >>
> >>
> >
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

On Tue, Oct 10, 2017 at 3:12 PM, Stack <st...@duboce.net> wrote:

> I've been splunking Phoenix code and I see that in SequenceRegionObserver
> and in Indexer, they do by-pass doing their own version of Increment.
>
>

Following in the previous vein of offering alternatives, Phoenix has its
own Increment because of "...deficiencies in Increment implementation
(HBASE-10254)":

 1) Lack of recognition and identification of when the key value to
increment doesn't exist
 2) Lack of the ability to set the timestamp of the updated key value.


Works the same as existing region.increment(), except assumes there is a
single column to increment and uses Phoenix LONG encoding.


I think #1 in above is at least improved and if not good enough, we could
fix. On #2, could add an ability to search the Increment payload for a
timestamp to use. Assuming a single column seems like a downgrade. Phoenix
LONG encoding might be hard to do unless we want to pull-in phoenix libs.


St.Ack










> St.Ack
>
> On Tue, Oct 10, 2017 at 11:49 AM, Stack <st...@duboce.net> wrote:
>
>> On Tue, Oct 10, 2017 at 11:10 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>
>>> The coprocessor API provides an environment method, bypass(), that when
>>> called from a preXXX hook will cause the core code to skip all remaining
>>> processing. This capability was introduced on HBASE-3348. Since this
>>> time I
>>> think we are more enlightened about the complications of this feature.
>>> (Or,
>>> anyway, speaking for myself:)
>>>
>>> Not all hooks provide the bypass semantic. Where this is the case the
>>> javadoc for the hook says so, but it can be missed. If you call bypass()
>>> in
>>> a hook where it is not supported it is a no-op. This can lead to a poor
>>> developer experience.
>>>
>>> Where bypass is supported what is being bypassed is all of the core code
>>> implementing the remainder of the operation. In order to understand what
>>> calling bypass() will skip, a coprocessor implementer should read and
>>> understand all of the remaining code and its nuances. Although I think
>>> this
>>> is good practice for coprocessor developers in general, it demands a
>>> lot. I
>>> think it would provide a much better developer experience if we didn't
>>> allow bypass, even though it means - in theory - a coprocessor would be a
>>> lot more limited in some ways than before. What is skipped is extremely
>>> version dependent. That core code will vary, perhaps significantly, even
>>> between point releases. We do not provide the promise of consistent
>>> behavior even between point releases for the bypass semantic. To achieve
>>> that we could not change any code between hook points. Therefore the
>>> coprocessor implementer becomes an HBase core developer in practice as
>>> soon
>>> as they rely on bypass(). Every release of HBase may break the assumption
>>> that the replacement for the bypassed code takes care of all necessary
>>> skipped concerns. Because those concerns can change at any point, such an
>>> assumption is never safe.
>>>
>>> I say "in theory" because I would be surprised if anyone is relying on
>>> the
>>> bypass for the above reason. I seem to recall that Phoenix might use it
>>> in
>>> one place to promote a normal mutation into an atomic operation, by
>>> substituting one for the other, but if so that objective could be
>>> reimplemented using their new locking manager.
>>>
>>>
>>
>> Thanks Andrew for starting the discussion.
>>
>> Up in JIRA we've also talked of a 'wonky' case where CPs need
>> write-access to the internal Metrics system so a CP on by-pass is able to
>> increment standard counters. For example, a CP might bypass a Get operation
>> instead returning a Cell from a CP-managed cache. In this latter case, it
>> might want to increment the Get metrics counters so the by-pass shows in
>> the general Get counts.
>>
>> This is problematic but there is at least an instance happening
>> downstream. We'd like to keep internal metrics internal. CPs should be able
>> to publish their own metrics.
>>
>> Please speak up if you depend on by-pass.
>>
>> S
>>
>>
>>
>>
>>
>>
>>
>>> --
>>> Best regards,
>>> Andrew
>>>
>>
>>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

I've been splunking Phoenix code and I see that in SequenceRegionObserver
and in Indexer, they do by-pass doing their own version of Increment.

St.Ack

On Tue, Oct 10, 2017 at 11:49 AM, Stack <st...@duboce.net> wrote:

> On Tue, Oct 10, 2017 at 11:10 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
>> The coprocessor API provides an environment method, bypass(), that when
>> called from a preXXX hook will cause the core code to skip all remaining
>> processing. This capability was introduced on HBASE-3348. Since this time
>> I
>> think we are more enlightened about the complications of this feature.
>> (Or,
>> anyway, speaking for myself:)
>>
>> Not all hooks provide the bypass semantic. Where this is the case the
>> javadoc for the hook says so, but it can be missed. If you call bypass()
>> in
>> a hook where it is not supported it is a no-op. This can lead to a poor
>> developer experience.
>>
>> Where bypass is supported what is being bypassed is all of the core code
>> implementing the remainder of the operation. In order to understand what
>> calling bypass() will skip, a coprocessor implementer should read and
>> understand all of the remaining code and its nuances. Although I think
>> this
>> is good practice for coprocessor developers in general, it demands a lot.
>> I
>> think it would provide a much better developer experience if we didn't
>> allow bypass, even though it means - in theory - a coprocessor would be a
>> lot more limited in some ways than before. What is skipped is extremely
>> version dependent. That core code will vary, perhaps significantly, even
>> between point releases. We do not provide the promise of consistent
>> behavior even between point releases for the bypass semantic. To achieve
>> that we could not change any code between hook points. Therefore the
>> coprocessor implementer becomes an HBase core developer in practice as
>> soon
>> as they rely on bypass(). Every release of HBase may break the assumption
>> that the replacement for the bypassed code takes care of all necessary
>> skipped concerns. Because those concerns can change at any point, such an
>> assumption is never safe.
>>
>> I say "in theory" because I would be surprised if anyone is relying on the
>> bypass for the above reason. I seem to recall that Phoenix might use it in
>> one place to promote a normal mutation into an atomic operation, by
>> substituting one for the other, but if so that objective could be
>> reimplemented using their new locking manager.
>>
>>
>
> Thanks Andrew for starting the discussion.
>
> Up in JIRA we've also talked of a 'wonky' case where CPs need write-access
> to the internal Metrics system so a CP on by-pass is able to increment
> standard counters. For example, a CP might bypass a Get operation instead
> returning a Cell from a CP-managed cache. In this latter case, it might
> want to increment the Get metrics counters so the by-pass shows in the
> general Get counts.
>
> This is problematic but there is at least an instance happening
> downstream. We'd like to keep internal metrics internal. CPs should be able
> to publish their own metrics.
>
> Please speak up if you depend on by-pass.
>
> S
>
>
>
>
>
>
>
>> --
>> Best regards,
>> Andrew
>>
>
>

Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs

Posted by Stack <st...@duboce.net>.

On Tue, Oct 10, 2017 at 11:10 AM, Andrew Purtell <ap...@apache.org>
wrote:

> The coprocessor API provides an environment method, bypass(), that when
> called from a preXXX hook will cause the core code to skip all remaining
> processing. This capability was introduced on HBASE-3348. Since this time I
> think we are more enlightened about the complications of this feature. (Or,
> anyway, speaking for myself:)
>
> Not all hooks provide the bypass semantic. Where this is the case the
> javadoc for the hook says so, but it can be missed. If you call bypass() in
> a hook where it is not supported it is a no-op. This can lead to a poor
> developer experience.
>
> Where bypass is supported what is being bypassed is all of the core code
> implementing the remainder of the operation. In order to understand what
> calling bypass() will skip, a coprocessor implementer should read and
> understand all of the remaining code and its nuances. Although I think this
> is good practice for coprocessor developers in general, it demands a lot. I
> think it would provide a much better developer experience if we didn't
> allow bypass, even though it means - in theory - a coprocessor would be a
> lot more limited in some ways than before. What is skipped is extremely
> version dependent. That core code will vary, perhaps significantly, even
> between point releases. We do not provide the promise of consistent
> behavior even between point releases for the bypass semantic. To achieve
> that we could not change any code between hook points. Therefore the
> coprocessor implementer becomes an HBase core developer in practice as soon
> as they rely on bypass(). Every release of HBase may break the assumption
> that the replacement for the bypassed code takes care of all necessary
> skipped concerns. Because those concerns can change at any point, such an
> assumption is never safe.
>
> I say "in theory" because I would be surprised if anyone is relying on the
> bypass for the above reason. I seem to recall that Phoenix might use it in
> one place to promote a normal mutation into an atomic operation, by
> substituting one for the other, but if so that objective could be
> reimplemented using their new locking manager.
>
>

Thanks Andrew for starting the discussion.

Up in JIRA we've also talked of a 'wonky' case where CPs need write-access
to the internal Metrics system so a CP on by-pass is able to increment
standard counters. For example, a CP might bypass a Get operation instead
returning a Cell from a CP-managed cache. In this latter case, it might
want to increment the Get metrics counters so the by-pass shows in the
general Get counts.

This is problematic but there is at least an instance happening downstream.
We'd like to keep internal metrics internal. CPs should be able to publish
their own metrics.

Please speak up if you depend on by-pass.

S







> --
> Best regards,
> Andrew
>