You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2017/10/04 22:15:44 UTC

Looking for input on an alpha-4 thorny item

A bunch of us are making good progress on the next alpha release,
hbase-2.0.0-alpha-4. The theme for this release is "Fixing the Coprocessor
API", mostly undoing access accidentally granted Coprocessors. I'm talking
out loud about a particularly awkward item here rather than in a comment up
in JIRA so the airing sees a broader audience. Interested in any opinions
or input you might have.

TL;DR MasterService/RegionServerService and Region, etc., Interfaces were
overloaded serving two, disparate roles; a load of refactoring has to be
done to undo the damage. Suggestions for how to avoid making same mistake
in future?

I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
(Discuss)". MasterServices started out as a subset of the Master
functionality. The idea back then was that certain Services and Managers
could make do w/ less-than-full-access to the HMaster process. If so, we
could test the Service and Manager without having to standup a full HMaster
instance (This usually required our putting up a cluster too). If
MasterServices had but a few methods, a Mock would be easy, making testing
easier still. We did the same thing on the RegionServer side where we had
RegionServerServices.

MasterServices (and RegionServerServices) were also exposed to
Coprocessors. The idea was that CPs could ask-for particular Master
functions via MasterService. In this role MasterServices was meant to
constrain what CPs had access to.

MasterServices therefore had two consumers; one internal, the other not.

With time, MasterServices got fat as internal Services and Managers needed
more of HMaster. Everytime we added to MasterServices, CPs got access.

On survey as part of the recent HBASE-12260 work, it turns out that the
bulk of the methods in MasterServices are actually annotated
InterfaceAudience.Private; i.e. for internal use only, not for
Coprocessors. A brutal purge of Private audience objects, makes for a
MasterServices we can pass Coprocessors but Coprocessors now have much less
facility available (for parts, there are alternatives; Andy review suggests
that CPs are severely crimped if this patch goes in). But MasterServices
can no longer be used for its original purpose, passing Services and
Managers a subset of HMaster. The latter brings on a substantial refactor.

Another example of the double-role problem outlined above was found by Duo
and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
similar tangle. There was the RegionServerServices <=> MasterServices case
but also the exposure of HRegion internals. In this latter, Region was
introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
Coprocessors. Subsequently, we all confused his original intent and went
ahead and thought of Region (as opposed to HRegion) as an Interface for
HRegion and plumbed it throughout the code base in place of explicit
HRegion references. As Region picked up functionality, Coprocessors gained
more access.

The refactoring pattern that has emerged out of the RegionServer-side
refactoring (which is ahead of the Master-work), is that we move to use the
HRegion implementation everywhere internally, undoing use of Region
Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
following suit on the Master side moving to use HMaster in place of
MasterServices in all launched Services and Managers.

How do we avoid this mistake in future? Should we rename Region as
CoprocessorRegion so it more plain that its consumer is Coprocessors? Ditto
on MasterServices?

Thanks,
S

Re: Looking for input on an alpha-4 thorny item

Posted by Andrew Purtell <ap...@apache.org>.
I think at least at the Store level we want to admit the possibility of
alternate implementations.

Region is pushing it. It would be ideal of course to have nice clean
interfaces which could be mocked or given to new implementations, but the
legacy aspects of the code base probably make that more work than worth the
effort unless someone is really going to try going all the way to a new
type of Region.


On Fri, Oct 6, 2017 at 9:33 AM, Stack <st...@duboce.net> wrote:

> On Thu, Oct 5, 2017 at 9:30 AM, Josh Elser <el...@apache.org> wrote:
>
> > (I think I understand the problems enough to comment, but, admittedly, my
> > 5minute read is probably lacking)
> >
> >
> Thanks for chiming-in Josh.
>
>
>
> > I think the only argument against what you all have outlined here is if,
> > in the future, we have some intent to create new implementations of
> HRegion
> > or HStore. If that's the case, Region could still be the CP's "view" of
> > what a region is, we could introduce another interface which defines what
> > the internal/private "view" of a region is, and then we plug in the
> > implementation of choice (e.g. HRegion as we know it now would become
> > something like HRegionImpl).
> >
> >
>
> Currently our codebase entertains the 'fantasy' that it is possible to plug
> in alternate Region and Store implementations; there is a config that
> allows you stipulate another Region class. As part of the Duo/Anoop
> pare-back of CPs on RegionServer-side, it was noted that we should disabuse
> ourselves of all such notions. It just doesn't work. We've not expended the
> effort to keep clean Region/Store Interfaces (Witness this note on
> confusion around intent of Region Interface for example). There are no
> tests.
>
>
>
> > However, I'm struggling to come up with a concrete use-case as to why we
> > would need that extra level of indirection. As such, I think the below
> > proposal makes sense.
> >
> >
> Yeah. We could do the work to backfill an 'HRegion Interface', but years
> later, there have been no takers. It'd be a fun project for sure but would
> have to have good justification (Justification would have to include why
> use HBase at all and not something like Apache Helix instead).
>
> Good stuff,
> S
>
>
>
>
>
> > The distillation of a tricky issue is quite appreciated!
> >
> >
> > On 10/4/17 6:51 PM, Andrew Purtell wrote:
> >
> >> I think it is fine to rebrand these interfaces as for coprocessors and
> tag
> >> them LP(COPROC):
> >>
> >>      Region (use HRegion in internals)
> >>
> >>      Store (use HStore in internals)
> >>
> >>      MasterServices (use HMaster in internals)
> >>
> >>      RegionServerServices (use HRegionServer in internals)
> >>
> >> and pare them down. This is not a complete list, just a handful of
> >> examples.
> >>
> >> Some specific points of feedback I have had:
> >>
> >> In Region, it would be good for CPs to be able to schedule async flushes
> >> and compactions, poll or wait for completion of a specific request, or
> >> wait
> >> for all pending flushes and compactions to complete. There is a Phoenix
> >> use
> >> case for this in indexing.
> >>
> >> Security coprocessors use MasterServices to create their system tables.
> >> Maybe this can be replaced by using the normal admin API for same.
> >>
> >> When removing access to internal services, consider if there are client
> >> API
> >> equivalents that the CP can use, and if embedded calls to such client
> APIs
> >> from the coprocessor context would be a good idea. CP invocation of an
> >> internal service is simply an in-process method call. That's good and
> bad,
> >> right? The bad part, direct access, is the thing we want to restrict,
> the
> >> motivation for this work (in part). But the good thing is it avoids a
> lot
> >> of the fat client logic unnecessary for all-in-process service
> invocation,
> >> which might even not work correctly. Removing everything is as drastic
> as
> >> allowing CPs access to everything. It could be fine to drastically pare
> >> down, but please consider it.
> >>
> >> Some changes have been proposed that removes access to metrics (e.g.
> >> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
> >> function and replace it. Until and unless we remove the bypass semantic
> >> (under discussion) we should continue to allow CPs access to metrics
> >> objects so they can update metrics as expected by admins and users when
> >> replacing functionality (via bypass). Metrics are a public facing API. I
> >> agree this is kind of dodgy. I believe we should remove the bypass
> >> semantic. Once that is done, coprocessors can only mix in additional
> >> functionality. No more cause to touch core metrics. They can export
> their
> >> own metrics if so desired.
> >> ​​
> >>
> >> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
> >> ​​
> >>
> >> A bunch of us are making good progress on the next alpha release,
> >>
> >>> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
> >>> Coprocessor
> >>> API", mostly undoing access accidentally granted Coprocessors. I'm
> >>> talking
> >>> out loud about a particularly awkward item here rather than in a
> comment
> >>> up
> >>> in JIRA so the airing sees a broader audience. Interested in any
> opinions
> >>> or input you might have.
> >>>
> >>> TL;DR MasterService/RegionServerService and Region, etc., Interfaces
> >>> were
> >>> overloaded serving two, disparate roles; a load of refactoring has to
> be
> >>> done to undo the damage. Suggestions for how to avoid making same
> mistake
> >>> in future?
> >>>
> >>> I'm working on "HBASE-12260 MasterServices - remove from coprocessor
> API
> >>> (Discuss)". MasterServices started out as a subset of the Master
> >>> functionality. The idea back then was that certain Services and
> Managers
> >>> could make do w/ less-than-full-access to the HMaster process. If so,
> we
> >>> could test the Service and Manager without having to standup a full
> >>> HMaster
> >>> instance (This usually required our putting up a cluster too). If
> >>> MasterServices had but a few methods, a Mock would be easy, making
> >>> testing
> >>> easier still. We did the same thing on the RegionServer side where we
> had
> >>> RegionServerServices.
> >>>
> >>> MasterServices (and RegionServerServices) were also exposed to
> >>> Coprocessors. The idea was that CPs could ask-for particular Master
> >>> functions via MasterService. In this role MasterServices was meant to
> >>> constrain what CPs had access to.
> >>>
> >>> MasterServices therefore had two consumers; one internal, the other
> not.
> >>>
> >>> With time, MasterServices got fat as internal Services and Managers
> >>> needed
> >>> more of HMaster. Everytime we added to MasterServices, CPs got access.
> >>>
> >>> On survey as part of the recent HBASE-12260 work, it turns out that the
> >>> bulk of the methods in MasterServices are actually annotated
> >>> InterfaceAudience.Private; i.e. for internal use only, not for
> >>> Coprocessors. A brutal purge of Private audience objects, makes for a
> >>> MasterServices we can pass Coprocessors but Coprocessors now have much
> >>> less
> >>> facility available (for parts, there are alternatives; Andy review
> >>> suggests
> >>> that CPs are severely crimped if this patch goes in). But
> MasterServices
> >>> can no longer be used for its original purpose, passing Services and
> >>> Managers a subset of HMaster. The latter brings on a substantial
> >>> refactor.
> >>>
> >>> Another example of the double-role problem outlined above was found by
> >>> Duo
> >>> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit
> a
> >>> similar tangle. There was the RegionServerServices <=> MasterServices
> >>> case
> >>> but also the exposure of HRegion internals. In this latter, Region was
> >>> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
> >>> Coprocessors. Subsequently, we all confused his original intent and
> went
> >>> ahead and thought of Region (as opposed to HRegion) as an Interface for
> >>> HRegion and plumbed it throughout the code base in place of explicit
> >>> HRegion references. As Region picked up functionality, Coprocessors
> >>> gained
> >>> more access.
> >>>
> >>> The refactoring pattern that has emerged out of the RegionServer-side
> >>> refactoring (which is ahead of the Master-work), is that we move to use
> >>> the
> >>> HRegion implementation everywhere internally, undoing use of Region
> >>> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp.
> I'm
> >>> following suit on the Master side moving to use HMaster in place of
> >>> MasterServices in all launched Services and Managers.
> >>>
> >>> How do we avoid this mistake in future? Should we rename Region as
> >>> CoprocessorRegion so it more plain that its consumer is Coprocessors?
> >>> Ditto
> >>> on MasterServices?
> >>>
> >>> Thanks,
> >>> S
> >>>
> >>>
> >>
> >>
> >>
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: Looking for input on an alpha-4 thorny item

Posted by Stack <st...@duboce.net>.
On Thu, Oct 5, 2017 at 9:30 AM, Josh Elser <el...@apache.org> wrote:

> (I think I understand the problems enough to comment, but, admittedly, my
> 5minute read is probably lacking)
>
>
Thanks for chiming-in Josh.



> I think the only argument against what you all have outlined here is if,
> in the future, we have some intent to create new implementations of HRegion
> or HStore. If that's the case, Region could still be the CP's "view" of
> what a region is, we could introduce another interface which defines what
> the internal/private "view" of a region is, and then we plug in the
> implementation of choice (e.g. HRegion as we know it now would become
> something like HRegionImpl).
>
>

Currently our codebase entertains the 'fantasy' that it is possible to plug
in alternate Region and Store implementations; there is a config that
allows you stipulate another Region class. As part of the Duo/Anoop
pare-back of CPs on RegionServer-side, it was noted that we should disabuse
ourselves of all such notions. It just doesn't work. We've not expended the
effort to keep clean Region/Store Interfaces (Witness this note on
confusion around intent of Region Interface for example). There are no
tests.



> However, I'm struggling to come up with a concrete use-case as to why we
> would need that extra level of indirection. As such, I think the below
> proposal makes sense.
>
>
Yeah. We could do the work to backfill an 'HRegion Interface', but years
later, there have been no takers. It'd be a fun project for sure but would
have to have good justification (Justification would have to include why
use HBase at all and not something like Apache Helix instead).

Good stuff,
S





> The distillation of a tricky issue is quite appreciated!
>
>
> On 10/4/17 6:51 PM, Andrew Purtell wrote:
>
>> I think it is fine to rebrand these interfaces as for coprocessors and tag
>> them LP(COPROC):
>>
>>      Region (use HRegion in internals)
>>
>>      Store (use HStore in internals)
>>
>>      MasterServices (use HMaster in internals)
>>
>>      RegionServerServices (use HRegionServer in internals)
>>
>> and pare them down. This is not a complete list, just a handful of
>> examples.
>>
>> Some specific points of feedback I have had:
>>
>> In Region, it would be good for CPs to be able to schedule async flushes
>> and compactions, poll or wait for completion of a specific request, or
>> wait
>> for all pending flushes and compactions to complete. There is a Phoenix
>> use
>> case for this in indexing.
>>
>> Security coprocessors use MasterServices to create their system tables.
>> Maybe this can be replaced by using the normal admin API for same.
>>
>> When removing access to internal services, consider if there are client
>> API
>> equivalents that the CP can use, and if embedded calls to such client APIs
>> from the coprocessor context would be a good idea. CP invocation of an
>> internal service is simply an in-process method call. That's good and bad,
>> right? The bad part, direct access, is the thing we want to restrict, the
>> motivation for this work (in part). But the good thing is it avoids a lot
>> of the fat client logic unnecessary for all-in-process service invocation,
>> which might even not work correctly. Removing everything is as drastic as
>> allowing CPs access to everything. It could be fine to drastically pare
>> down, but please consider it.
>>
>> Some changes have been proposed that removes access to metrics (e.g.
>> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
>> function and replace it. Until and unless we remove the bypass semantic
>> (under discussion) we should continue to allow CPs access to metrics
>> objects so they can update metrics as expected by admins and users when
>> replacing functionality (via bypass). Metrics are a public facing API. I
>> agree this is kind of dodgy. I believe we should remove the bypass
>> semantic. Once that is done, coprocessors can only mix in additional
>> functionality. No more cause to touch core metrics. They can export their
>> own metrics if so desired.
>> ​​
>>
>> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
>> ​​
>>
>> A bunch of us are making good progress on the next alpha release,
>>
>>> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
>>> Coprocessor
>>> API", mostly undoing access accidentally granted Coprocessors. I'm
>>> talking
>>> out loud about a particularly awkward item here rather than in a comment
>>> up
>>> in JIRA so the airing sees a broader audience. Interested in any opinions
>>> or input you might have.
>>>
>>> TL;DR MasterService/RegionServerService and Region, etc., Interfaces
>>> were
>>> overloaded serving two, disparate roles; a load of refactoring has to be
>>> done to undo the damage. Suggestions for how to avoid making same mistake
>>> in future?
>>>
>>> I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
>>> (Discuss)". MasterServices started out as a subset of the Master
>>> functionality. The idea back then was that certain Services and Managers
>>> could make do w/ less-than-full-access to the HMaster process. If so, we
>>> could test the Service and Manager without having to standup a full
>>> HMaster
>>> instance (This usually required our putting up a cluster too). If
>>> MasterServices had but a few methods, a Mock would be easy, making
>>> testing
>>> easier still. We did the same thing on the RegionServer side where we had
>>> RegionServerServices.
>>>
>>> MasterServices (and RegionServerServices) were also exposed to
>>> Coprocessors. The idea was that CPs could ask-for particular Master
>>> functions via MasterService. In this role MasterServices was meant to
>>> constrain what CPs had access to.
>>>
>>> MasterServices therefore had two consumers; one internal, the other not.
>>>
>>> With time, MasterServices got fat as internal Services and Managers
>>> needed
>>> more of HMaster. Everytime we added to MasterServices, CPs got access.
>>>
>>> On survey as part of the recent HBASE-12260 work, it turns out that the
>>> bulk of the methods in MasterServices are actually annotated
>>> InterfaceAudience.Private; i.e. for internal use only, not for
>>> Coprocessors. A brutal purge of Private audience objects, makes for a
>>> MasterServices we can pass Coprocessors but Coprocessors now have much
>>> less
>>> facility available (for parts, there are alternatives; Andy review
>>> suggests
>>> that CPs are severely crimped if this patch goes in). But MasterServices
>>> can no longer be used for its original purpose, passing Services and
>>> Managers a subset of HMaster. The latter brings on a substantial
>>> refactor.
>>>
>>> Another example of the double-role problem outlined above was found by
>>> Duo
>>> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
>>> similar tangle. There was the RegionServerServices <=> MasterServices
>>> case
>>> but also the exposure of HRegion internals. In this latter, Region was
>>> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
>>> Coprocessors. Subsequently, we all confused his original intent and went
>>> ahead and thought of Region (as opposed to HRegion) as an Interface for
>>> HRegion and plumbed it throughout the code base in place of explicit
>>> HRegion references. As Region picked up functionality, Coprocessors
>>> gained
>>> more access.
>>>
>>> The refactoring pattern that has emerged out of the RegionServer-side
>>> refactoring (which is ahead of the Master-work), is that we move to use
>>> the
>>> HRegion implementation everywhere internally, undoing use of Region
>>> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
>>> following suit on the Master side moving to use HMaster in place of
>>> MasterServices in all launched Services and Managers.
>>>
>>> How do we avoid this mistake in future? Should we rename Region as
>>> CoprocessorRegion so it more plain that its consumer is Coprocessors?
>>> Ditto
>>> on MasterServices?
>>>
>>> Thanks,
>>> S
>>>
>>>
>>
>>
>>

Re: Looking for input on an alpha-4 thorny item

Posted by Josh Elser <el...@apache.org>.
(I think I understand the problems enough to comment, but, admittedly, 
my 5minute read is probably lacking)

I think the only argument against what you all have outlined here is if, 
in the future, we have some intent to create new implementations of 
HRegion or HStore. If that's the case, Region could still be the CP's 
"view" of what a region is, we could introduce another interface which 
defines what the internal/private "view" of a region is, and then we 
plug in the implementation of choice (e.g. HRegion as we know it now 
would become something like HRegionImpl).

However, I'm struggling to come up with a concrete use-case as to why we 
would need that extra level of indirection. As such, I think the below 
proposal makes sense.

The distillation of a tricky issue is quite appreciated!

On 10/4/17 6:51 PM, Andrew Purtell wrote:
> I think it is fine to rebrand these interfaces as for coprocessors and tag
> them LP(COPROC):
> 
>      Region (use HRegion in internals)
> 
>      Store (use HStore in internals)
> 
>      MasterServices (use HMaster in internals)
> 
>      RegionServerServices (use HRegionServer in internals)
> 
> and pare them down. This is not a complete list, just a handful of examples.
> 
> Some specific points of feedback I have had:
> 
> In Region, it would be good for CPs to be able to schedule async flushes
> and compactions, poll or wait for completion of a specific request, or wait
> for all pending flushes and compactions to complete. There is a Phoenix use
> case for this in indexing.
> 
> Security coprocessors use MasterServices to create their system tables.
> Maybe this can be replaced by using the normal admin API for same.
> 
> When removing access to internal services, consider if there are client API
> equivalents that the CP can use, and if embedded calls to such client APIs
> from the coprocessor context would be a good idea. CP invocation of an
> internal service is simply an in-process method call. That's good and bad,
> right? The bad part, direct access, is the thing we want to restrict, the
> motivation for this work (in part). But the good thing is it avoids a lot
> of the fat client logic unnecessary for all-in-process service invocation,
> which might even not work correctly. Removing everything is as drastic as
> allowing CPs access to everything. It could be fine to drastically pare
> down, but please consider it.
> 
> Some changes have been proposed that removes access to metrics (e.g.
> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
> function and replace it. Until and unless we remove the bypass semantic
> (under discussion) we should continue to allow CPs access to metrics
> objects so they can update metrics as expected by admins and users when
> replacing functionality (via bypass). Metrics are a public facing API. I
> agree this is kind of dodgy. I believe we should remove the bypass
> semantic. Once that is done, coprocessors can only mix in additional
> functionality. No more cause to touch core metrics. They can export their
> own metrics if so desired.
> ​​
> 
> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
> ​​
> 
> A bunch of us are making good progress on the next alpha release,
>> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the Coprocessor
>> API", mostly undoing access accidentally granted Coprocessors. I'm talking
>> out loud about a particularly awkward item here rather than in a comment up
>> in JIRA so the airing sees a broader audience. Interested in any opinions
>> or input you might have.
>>
>> TL;DR MasterService/RegionServerService and Region, etc., Interfaces were
>> overloaded serving two, disparate roles; a load of refactoring has to be
>> done to undo the damage. Suggestions for how to avoid making same mistake
>> in future?
>>
>> I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
>> (Discuss)". MasterServices started out as a subset of the Master
>> functionality. The idea back then was that certain Services and Managers
>> could make do w/ less-than-full-access to the HMaster process. If so, we
>> could test the Service and Manager without having to standup a full HMaster
>> instance (This usually required our putting up a cluster too). If
>> MasterServices had but a few methods, a Mock would be easy, making testing
>> easier still. We did the same thing on the RegionServer side where we had
>> RegionServerServices.
>>
>> MasterServices (and RegionServerServices) were also exposed to
>> Coprocessors. The idea was that CPs could ask-for particular Master
>> functions via MasterService. In this role MasterServices was meant to
>> constrain what CPs had access to.
>>
>> MasterServices therefore had two consumers; one internal, the other not.
>>
>> With time, MasterServices got fat as internal Services and Managers needed
>> more of HMaster. Everytime we added to MasterServices, CPs got access.
>>
>> On survey as part of the recent HBASE-12260 work, it turns out that the
>> bulk of the methods in MasterServices are actually annotated
>> InterfaceAudience.Private; i.e. for internal use only, not for
>> Coprocessors. A brutal purge of Private audience objects, makes for a
>> MasterServices we can pass Coprocessors but Coprocessors now have much less
>> facility available (for parts, there are alternatives; Andy review suggests
>> that CPs are severely crimped if this patch goes in). But MasterServices
>> can no longer be used for its original purpose, passing Services and
>> Managers a subset of HMaster. The latter brings on a substantial refactor.
>>
>> Another example of the double-role problem outlined above was found by Duo
>> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
>> similar tangle. There was the RegionServerServices <=> MasterServices case
>> but also the exposure of HRegion internals. In this latter, Region was
>> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
>> Coprocessors. Subsequently, we all confused his original intent and went
>> ahead and thought of Region (as opposed to HRegion) as an Interface for
>> HRegion and plumbed it throughout the code base in place of explicit
>> HRegion references. As Region picked up functionality, Coprocessors gained
>> more access.
>>
>> The refactoring pattern that has emerged out of the RegionServer-side
>> refactoring (which is ahead of the Master-work), is that we move to use the
>> HRegion implementation everywhere internally, undoing use of Region
>> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
>> following suit on the Master side moving to use HMaster in place of
>> MasterServices in all launched Services and Managers.
>>
>> How do we avoid this mistake in future? Should we rename Region as
>> CoprocessorRegion so it more plain that its consumer is Coprocessors? Ditto
>> on MasterServices?
>>
>> Thanks,
>> S
>>
> 
> 
> 

Re: Looking for input on an alpha-4 thorny item

Posted by Andrew Purtell <ap...@apache.org>.
> I agree that what we publish as metrics out over our jmx interface is
public. What is interesting though is that all Metrics implementation
classes are IA.private. And what we push out to operators is read-only.

Yep the problem as I see it is those metrics that are published via Hadoop
metrics or scraped from JMX go to production monitoring systems and are
consumed to produce reports, alerts, predictive analytics on performance
trends, etc.  If a coprocessor bypasses core code that updates a metric and
replaces that core functionality, then if the metrics are not updated by
the coprocessor, the accumulated counts in those external systems are no
longer representative of what is happening in the server. On the other hand
I see your point on metrics interface annotations. I think it is one reason
out of many why bypass seems not a good idea. The effects are going to be
very version dependent and unstable, and considerations like whether or not
metrics should be updated expect the implementer to study what they are
replacing in detail and understand all the ramifications, which given the
Phoenix example is hit or miss.


On Mon, Oct 9, 2017 at 11:24 AM, Stack <st...@duboce.net> wrote:

> On Fri, Oct 6, 2017 at 9:10 AM, Stack <st...@duboce.net> wrote:
>
> > ....
> >
> >
> >> Some changes have been proposed that removes access to metrics (e.g.
> >> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
> >> function and replace it. Until and unless we remove the bypass semantic
> >> (under discussion) we should continue to allow CPs access to metrics
> >> objects so they can update metrics as expected by admins and users when
> >> replacing functionality (via bypass). Metrics are a public facing API. I
> >> agree this is kind of dodgy. I believe we should remove the bypass
> >> semantic. Once that is done, coprocessors can only mix in additional
> >> functionality. No more cause to touch core metrics. They can export
> their
> >> own metrics if so desired.
> >> ​​
> >>
> >
>
> Where is the by-pass conversation happening? On quick review, I am unable
> to find it. I'm interested given the above.
>
> Above you say metrics are 'public'. I agree that what we publish as metrics
> out over our jmx interface is public. What is interesting though is that
> all Metrics implementation classes are IA.private. And what we push out to
> operators is read-only.
>
> So, yeah, interested in by-pass discussion.
>
> Thanks,
> St.Ack
>
>
>
>
>
> >
> >
> >> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
> >> ​​
> >>
> >> A bunch of us are making good progress on the next alpha release,
> >> > hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
> >> Coprocessor
> >> > API", mostly undoing access accidentally granted Coprocessors. I'm
> >> talking
> >> > out loud about a particularly awkward item here rather than in a
> >> comment up
> >> > in JIRA so the airing sees a broader audience. Interested in any
> >> opinions
> >> > or input you might have.
> >> >
> >> > TL;DR MasterService/RegionServerService and Region, etc., Interfaces
> >> were
> >> > overloaded serving two, disparate roles; a load of refactoring has to
> be
> >> > done to undo the damage. Suggestions for how to avoid making same
> >> mistake
> >> > in future?
> >> >
> >> > I'm working on "HBASE-12260 MasterServices - remove from coprocessor
> API
> >> > (Discuss)". MasterServices started out as a subset of the Master
> >> > functionality. The idea back then was that certain Services and
> Managers
> >> > could make do w/ less-than-full-access to the HMaster process. If so,
> we
> >> > could test the Service and Manager without having to standup a full
> >> HMaster
> >> > instance (This usually required our putting up a cluster too). If
> >> > MasterServices had but a few methods, a Mock would be easy, making
> >> testing
> >> > easier still. We did the same thing on the RegionServer side where we
> >> had
> >> > RegionServerServices.
> >> >
> >> > MasterServices (and RegionServerServices) were also exposed to
> >> > Coprocessors. The idea was that CPs could ask-for particular Master
> >> > functions via MasterService. In this role MasterServices was meant to
> >> > constrain what CPs had access to.
> >> >
> >> > MasterServices therefore had two consumers; one internal, the other
> not.
> >> >
> >> > With time, MasterServices got fat as internal Services and Managers
> >> needed
> >> > more of HMaster. Everytime we added to MasterServices, CPs got access.
> >> >
> >> > On survey as part of the recent HBASE-12260 work, it turns out that
> the
> >> > bulk of the methods in MasterServices are actually annotated
> >> > InterfaceAudience.Private; i.e. for internal use only, not for
> >> > Coprocessors. A brutal purge of Private audience objects, makes for a
> >> > MasterServices we can pass Coprocessors but Coprocessors now have much
> >> less
> >> > facility available (for parts, there are alternatives; Andy review
> >> suggests
> >> > that CPs are severely crimped if this patch goes in). But
> MasterServices
> >> > can no longer be used for its original purpose, passing Services and
> >> > Managers a subset of HMaster. The latter brings on a substantial
> >> refactor.
> >> >
> >> > Another example of the double-role problem outlined above was found by
> >> Duo
> >> > and Anoop in the RegionServer Coprocessor refactor salt mine. They
> hit a
> >> > similar tangle. There was the RegionServerServices <=> MasterServices
> >> case
> >> > but also the exposure of HRegion internals. In this latter, Region was
> >> > introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
> >> > Coprocessors. Subsequently, we all confused his original intent and
> went
> >> > ahead and thought of Region (as opposed to HRegion) as an Interface
> for
> >> > HRegion and plumbed it throughout the code base in place of explicit
> >> > HRegion references. As Region picked up functionality, Coprocessors
> >> gained
> >> > more access.
> >> >
> >> > The refactoring pattern that has emerged out of the RegionServer-side
> >> > refactoring (which is ahead of the Master-work), is that we move to
> use
> >> the
> >> > HRegion implementation everywhere internally, undoing use of Region
> >> > Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp.
> >> I'm
> >> > following suit on the Master side moving to use HMaster in place of
> >> > MasterServices in all launched Services and Managers.
> >> >
> >> > How do we avoid this mistake in future? Should we rename Region as
> >> > CoprocessorRegion so it more plain that its consumer is Coprocessors?
> >> Ditto
> >> > on MasterServices?
> >> >
> >> > Thanks,
> >> > S
> >> >
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew
> >>
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>    - A23, Crosstalk
> >>
> >
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: Looking for input on an alpha-4 thorny item

Posted by Andrew Purtell <ap...@apache.org>.
Here, and on the JIRAs. Maybe 'conversation' is an optimistic way to
describe the state of things so far. We should have a separate thread for
it. As far as I know, we don't have one yet.


On Mon, Oct 9, 2017 at 11:24 AM, Stack <st...@duboce.net> wrote:

> On Fri, Oct 6, 2017 at 9:10 AM, Stack <st...@duboce.net> wrote:
>
> > ....
> >
> >
> >> Some changes have been proposed that removes access to metrics (e.g.
> >> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
> >> function and replace it. Until and unless we remove the bypass semantic
> >> (under discussion) we should continue to allow CPs access to metrics
> >> objects so they can update metrics as expected by admins and users when
> >> replacing functionality (via bypass). Metrics are a public facing API. I
> >> agree this is kind of dodgy. I believe we should remove the bypass
> >> semantic. Once that is done, coprocessors can only mix in additional
> >> functionality. No more cause to touch core metrics. They can export
> their
> >> own metrics if so desired.
> >> ​​
> >>
> >
>
> Where is the by-pass conversation happening? On quick review, I am unable
> to find it. I'm interested given the above.
>
> Above you say metrics are 'public'. I agree that what we publish as metrics
> out over our jmx interface is public. What is interesting though is that
> all Metrics implementation classes are IA.private. And what we push out to
> operators is read-only.
>
> So, yeah, interested in by-pass discussion.
>
> Thanks,
> St.Ack
>
>
>
>
>
> >
> >
> >> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
> >> ​​
> >>
> >> A bunch of us are making good progress on the next alpha release,
> >> > hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
> >> Coprocessor
> >> > API", mostly undoing access accidentally granted Coprocessors. I'm
> >> talking
> >> > out loud about a particularly awkward item here rather than in a
> >> comment up
> >> > in JIRA so the airing sees a broader audience. Interested in any
> >> opinions
> >> > or input you might have.
> >> >
> >> > TL;DR MasterService/RegionServerService and Region, etc., Interfaces
> >> were
> >> > overloaded serving two, disparate roles; a load of refactoring has to
> be
> >> > done to undo the damage. Suggestions for how to avoid making same
> >> mistake
> >> > in future?
> >> >
> >> > I'm working on "HBASE-12260 MasterServices - remove from coprocessor
> API
> >> > (Discuss)". MasterServices started out as a subset of the Master
> >> > functionality. The idea back then was that certain Services and
> Managers
> >> > could make do w/ less-than-full-access to the HMaster process. If so,
> we
> >> > could test the Service and Manager without having to standup a full
> >> HMaster
> >> > instance (This usually required our putting up a cluster too). If
> >> > MasterServices had but a few methods, a Mock would be easy, making
> >> testing
> >> > easier still. We did the same thing on the RegionServer side where we
> >> had
> >> > RegionServerServices.
> >> >
> >> > MasterServices (and RegionServerServices) were also exposed to
> >> > Coprocessors. The idea was that CPs could ask-for particular Master
> >> > functions via MasterService. In this role MasterServices was meant to
> >> > constrain what CPs had access to.
> >> >
> >> > MasterServices therefore had two consumers; one internal, the other
> not.
> >> >
> >> > With time, MasterServices got fat as internal Services and Managers
> >> needed
> >> > more of HMaster. Everytime we added to MasterServices, CPs got access.
> >> >
> >> > On survey as part of the recent HBASE-12260 work, it turns out that
> the
> >> > bulk of the methods in MasterServices are actually annotated
> >> > InterfaceAudience.Private; i.e. for internal use only, not for
> >> > Coprocessors. A brutal purge of Private audience objects, makes for a
> >> > MasterServices we can pass Coprocessors but Coprocessors now have much
> >> less
> >> > facility available (for parts, there are alternatives; Andy review
> >> suggests
> >> > that CPs are severely crimped if this patch goes in). But
> MasterServices
> >> > can no longer be used for its original purpose, passing Services and
> >> > Managers a subset of HMaster. The latter brings on a substantial
> >> refactor.
> >> >
> >> > Another example of the double-role problem outlined above was found by
> >> Duo
> >> > and Anoop in the RegionServer Coprocessor refactor salt mine. They
> hit a
> >> > similar tangle. There was the RegionServerServices <=> MasterServices
> >> case
> >> > but also the exposure of HRegion internals. In this latter, Region was
> >> > introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
> >> > Coprocessors. Subsequently, we all confused his original intent and
> went
> >> > ahead and thought of Region (as opposed to HRegion) as an Interface
> for
> >> > HRegion and plumbed it throughout the code base in place of explicit
> >> > HRegion references. As Region picked up functionality, Coprocessors
> >> gained
> >> > more access.
> >> >
> >> > The refactoring pattern that has emerged out of the RegionServer-side
> >> > refactoring (which is ahead of the Master-work), is that we move to
> use
> >> the
> >> > HRegion implementation everywhere internally, undoing use of Region
> >> > Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp.
> >> I'm
> >> > following suit on the Master side moving to use HMaster in place of
> >> > MasterServices in all launched Services and Managers.
> >> >
> >> > How do we avoid this mistake in future? Should we rename Region as
> >> > CoprocessorRegion so it more plain that its consumer is Coprocessors?
> >> Ditto
> >> > on MasterServices?
> >> >
> >> > Thanks,
> >> > S
> >> >
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew
> >>
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>    - A23, Crosstalk
> >>
> >
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: Looking for input on an alpha-4 thorny item

Posted by Stack <st...@duboce.net>.
On Fri, Oct 6, 2017 at 9:10 AM, Stack <st...@duboce.net> wrote:

> ....
>
>
>> Some changes have been proposed that removes access to metrics (e.g.
>> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
>> function and replace it. Until and unless we remove the bypass semantic
>> (under discussion) we should continue to allow CPs access to metrics
>> objects so they can update metrics as expected by admins and users when
>> replacing functionality (via bypass). Metrics are a public facing API. I
>> agree this is kind of dodgy. I believe we should remove the bypass
>> semantic. Once that is done, coprocessors can only mix in additional
>> functionality. No more cause to touch core metrics. They can export their
>> own metrics if so desired.
>> ​​
>>
>

Where is the by-pass conversation happening? On quick review, I am unable
to find it. I'm interested given the above.

Above you say metrics are 'public'. I agree that what we publish as metrics
out over our jmx interface is public. What is interesting though is that
all Metrics implementation classes are IA.private. And what we push out to
operators is read-only.

So, yeah, interested in by-pass discussion.

Thanks,
St.Ack





>
>
>> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
>> ​​
>>
>> A bunch of us are making good progress on the next alpha release,
>> > hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
>> Coprocessor
>> > API", mostly undoing access accidentally granted Coprocessors. I'm
>> talking
>> > out loud about a particularly awkward item here rather than in a
>> comment up
>> > in JIRA so the airing sees a broader audience. Interested in any
>> opinions
>> > or input you might have.
>> >
>> > TL;DR MasterService/RegionServerService and Region, etc., Interfaces
>> were
>> > overloaded serving two, disparate roles; a load of refactoring has to be
>> > done to undo the damage. Suggestions for how to avoid making same
>> mistake
>> > in future?
>> >
>> > I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
>> > (Discuss)". MasterServices started out as a subset of the Master
>> > functionality. The idea back then was that certain Services and Managers
>> > could make do w/ less-than-full-access to the HMaster process. If so, we
>> > could test the Service and Manager without having to standup a full
>> HMaster
>> > instance (This usually required our putting up a cluster too). If
>> > MasterServices had but a few methods, a Mock would be easy, making
>> testing
>> > easier still. We did the same thing on the RegionServer side where we
>> had
>> > RegionServerServices.
>> >
>> > MasterServices (and RegionServerServices) were also exposed to
>> > Coprocessors. The idea was that CPs could ask-for particular Master
>> > functions via MasterService. In this role MasterServices was meant to
>> > constrain what CPs had access to.
>> >
>> > MasterServices therefore had two consumers; one internal, the other not.
>> >
>> > With time, MasterServices got fat as internal Services and Managers
>> needed
>> > more of HMaster. Everytime we added to MasterServices, CPs got access.
>> >
>> > On survey as part of the recent HBASE-12260 work, it turns out that the
>> > bulk of the methods in MasterServices are actually annotated
>> > InterfaceAudience.Private; i.e. for internal use only, not for
>> > Coprocessors. A brutal purge of Private audience objects, makes for a
>> > MasterServices we can pass Coprocessors but Coprocessors now have much
>> less
>> > facility available (for parts, there are alternatives; Andy review
>> suggests
>> > that CPs are severely crimped if this patch goes in). But MasterServices
>> > can no longer be used for its original purpose, passing Services and
>> > Managers a subset of HMaster. The latter brings on a substantial
>> refactor.
>> >
>> > Another example of the double-role problem outlined above was found by
>> Duo
>> > and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
>> > similar tangle. There was the RegionServerServices <=> MasterServices
>> case
>> > but also the exposure of HRegion internals. In this latter, Region was
>> > introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
>> > Coprocessors. Subsequently, we all confused his original intent and went
>> > ahead and thought of Region (as opposed to HRegion) as an Interface for
>> > HRegion and plumbed it throughout the code base in place of explicit
>> > HRegion references. As Region picked up functionality, Coprocessors
>> gained
>> > more access.
>> >
>> > The refactoring pattern that has emerged out of the RegionServer-side
>> > refactoring (which is ahead of the Master-work), is that we move to use
>> the
>> > HRegion implementation everywhere internally, undoing use of Region
>> > Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp.
>> I'm
>> > following suit on the Master side moving to use HMaster in place of
>> > MasterServices in all launched Services and Managers.
>> >
>> > How do we avoid this mistake in future? Should we rename Region as
>> > CoprocessorRegion so it more plain that its consumer is Coprocessors?
>> Ditto
>> > on MasterServices?
>> >
>> > Thanks,
>> > S
>> >
>>
>>
>>
>> --
>> Best regards,
>> Andrew
>>
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>    - A23, Crosstalk
>>
>
>

Re: Looking for input on an alpha-4 thorny item

Posted by Stack <st...@duboce.net>.
On Wed, Oct 4, 2017 at 3:51 PM, Andrew Purtell <ap...@apache.org> wrote:

> I think it is fine to rebrand these interfaces as for coprocessors and tag
> them LP(COPROC):
>
>     Region (use HRegion in internals)
>
>     Store (use HStore in internals)
>
>     MasterServices (use HMaster in internals)
>
>     RegionServerServices (use HRegionServer in internals)
>
> and pare them down. This is not a complete list, just a handful of
> examples.
>
>
Will do. Just to say that it means a bunch of refactoring undoing reliance
on Interfaces (We have too many services and managers!)


> Some specific points of feedback I have had:
>
> In Region, it would be good for CPs to be able to schedule async flushes
> and compactions, poll or wait for completion of a specific request, or wait
> for all pending flushes and compactions to complete. There is a Phoenix use
> case for this in indexing.
>
>
The means are in place. Will make sure it happens.



> Security coprocessors use MasterServices to create their system tables.
> Maybe this can be replaced by using the normal admin API for same.
>
>
Yes. Working on this too.



> When removing access to internal services, consider if there are client API
> equivalents that the CP can use, and if embedded calls to such client APIs
> from the coprocessor context would be a good idea. CP invocation of an
> internal service is simply an in-process method call. That's good and bad,
> right? The bad part, direct access, is the thing we want to restrict, the
> motivation for this work (in part). But the good thing is it avoids a lot
> of the fat client logic unnecessary for all-in-process service invocation,
> which might even not work correctly. Removing everything is as drastic as
> allowing CPs access to everything. It could be fine to drastically pare
> down, but please consider it.
>
>
The short-circuit of RPC (and PB) we have where when a Client is on the
target host, the invocation is direct helps here. Moving the
RegionServerGroup feature over to the new cut-down Coprocessor API has
turned up some holes in our Admin Interface -- i.e. there is no getServers
method unless you go via ClusterReport -- that I'm backfilling so RSGroups
can use Admin Interface instead of now-removed CP API. But also, have had
to add back a few items to MasterServices in a more constrained form.


> Some changes have been proposed that removes access to metrics (e.g.
> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
> function and replace it. Until and unless we remove the bypass semantic
> (under discussion) we should continue to allow CPs access to metrics
> objects so they can update metrics as expected by admins and users when
> replacing functionality (via bypass). Metrics are a public facing API. I
> agree this is kind of dodgy. I believe we should remove the bypass
> semantic. Once that is done, coprocessors can only mix in additional
> functionality. No more cause to touch core metrics. They can export their
> own metrics if so desired.
> ​​
>
>
Will be back to ping you on this one. Yeah, I think it dodgy CPs updating
internal metrics. Will look at the radical removal of bypass too... Will be
back.

Thanks Andrew,
S




> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
> ​​
>
> A bunch of us are making good progress on the next alpha release,
> > hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
> Coprocessor
> > API", mostly undoing access accidentally granted Coprocessors. I'm
> talking
> > out loud about a particularly awkward item here rather than in a comment
> up
> > in JIRA so the airing sees a broader audience. Interested in any opinions
> > or input you might have.
> >
> > TL;DR MasterService/RegionServerService and Region, etc., Interfaces
> were
> > overloaded serving two, disparate roles; a load of refactoring has to be
> > done to undo the damage. Suggestions for how to avoid making same mistake
> > in future?
> >
> > I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
> > (Discuss)". MasterServices started out as a subset of the Master
> > functionality. The idea back then was that certain Services and Managers
> > could make do w/ less-than-full-access to the HMaster process. If so, we
> > could test the Service and Manager without having to standup a full
> HMaster
> > instance (This usually required our putting up a cluster too). If
> > MasterServices had but a few methods, a Mock would be easy, making
> testing
> > easier still. We did the same thing on the RegionServer side where we had
> > RegionServerServices.
> >
> > MasterServices (and RegionServerServices) were also exposed to
> > Coprocessors. The idea was that CPs could ask-for particular Master
> > functions via MasterService. In this role MasterServices was meant to
> > constrain what CPs had access to.
> >
> > MasterServices therefore had two consumers; one internal, the other not.
> >
> > With time, MasterServices got fat as internal Services and Managers
> needed
> > more of HMaster. Everytime we added to MasterServices, CPs got access.
> >
> > On survey as part of the recent HBASE-12260 work, it turns out that the
> > bulk of the methods in MasterServices are actually annotated
> > InterfaceAudience.Private; i.e. for internal use only, not for
> > Coprocessors. A brutal purge of Private audience objects, makes for a
> > MasterServices we can pass Coprocessors but Coprocessors now have much
> less
> > facility available (for parts, there are alternatives; Andy review
> suggests
> > that CPs are severely crimped if this patch goes in). But MasterServices
> > can no longer be used for its original purpose, passing Services and
> > Managers a subset of HMaster. The latter brings on a substantial
> refactor.
> >
> > Another example of the double-role problem outlined above was found by
> Duo
> > and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
> > similar tangle. There was the RegionServerServices <=> MasterServices
> case
> > but also the exposure of HRegion internals. In this latter, Region was
> > introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
> > Coprocessors. Subsequently, we all confused his original intent and went
> > ahead and thought of Region (as opposed to HRegion) as an Interface for
> > HRegion and plumbed it throughout the code base in place of explicit
> > HRegion references. As Region picked up functionality, Coprocessors
> gained
> > more access.
> >
> > The refactoring pattern that has emerged out of the RegionServer-side
> > refactoring (which is ahead of the Master-work), is that we move to use
> the
> > HRegion implementation everywhere internally, undoing use of Region
> > Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
> > following suit on the Master side moving to use HMaster in place of
> > MasterServices in all launched Services and Managers.
> >
> > How do we avoid this mistake in future? Should we rename Region as
> > CoprocessorRegion so it more plain that its consumer is Coprocessors?
> Ditto
> > on MasterServices?
> >
> > Thanks,
> > S
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: Looking for input on an alpha-4 thorny item

Posted by Andrew Purtell <ap...@apache.org>.
I think it is fine to rebrand these interfaces as for coprocessors and tag
them LP(COPROC):

    Region (use HRegion in internals)

    Store (use HStore in internals)

    MasterServices (use HMaster in internals)

    RegionServerServices (use HRegionServer in internals)

and pare them down. This is not a complete list, just a handful of examples.

Some specific points of feedback I have had:

In Region, it would be good for CPs to be able to schedule async flushes
and compactions, poll or wait for completion of a specific request, or wait
for all pending flushes and compactions to complete. There is a Phoenix use
case for this in indexing.

Security coprocessors use MasterServices to create their system tables.
Maybe this can be replaced by using the normal admin API for same.

When removing access to internal services, consider if there are client API
equivalents that the CP can use, and if embedded calls to such client APIs
from the coprocessor context would be a good idea. CP invocation of an
internal service is simply an in-process method call. That's good and bad,
right? The bad part, direct access, is the thing we want to restrict, the
motivation for this work (in part). But the good thing is it avoids a lot
of the fat client logic unnecessary for all-in-process service invocation,
which might even not work correctly. Removing everything is as drastic as
allowing CPs access to everything. It could be fine to drastically pare
down, but please consider it.

Some changes have been proposed that removes access to metrics (e.g.
RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
function and replace it. Until and unless we remove the bypass semantic
(under discussion) we should continue to allow CPs access to metrics
objects so they can update metrics as expected by admins and users when
replacing functionality (via bypass). Metrics are a public facing API. I
agree this is kind of dodgy. I believe we should remove the bypass
semantic. Once that is done, coprocessors can only mix in additional
functionality. No more cause to touch core metrics. They can export their
own metrics if so desired.
​​

On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
​​

A bunch of us are making good progress on the next alpha release,
> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the Coprocessor
> API", mostly undoing access accidentally granted Coprocessors. I'm talking
> out loud about a particularly awkward item here rather than in a comment up
> in JIRA so the airing sees a broader audience. Interested in any opinions
> or input you might have.
>
> TL;DR MasterService/RegionServerService and Region, etc., Interfaces were
> overloaded serving two, disparate roles; a load of refactoring has to be
> done to undo the damage. Suggestions for how to avoid making same mistake
> in future?
>
> I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
> (Discuss)". MasterServices started out as a subset of the Master
> functionality. The idea back then was that certain Services and Managers
> could make do w/ less-than-full-access to the HMaster process. If so, we
> could test the Service and Manager without having to standup a full HMaster
> instance (This usually required our putting up a cluster too). If
> MasterServices had but a few methods, a Mock would be easy, making testing
> easier still. We did the same thing on the RegionServer side where we had
> RegionServerServices.
>
> MasterServices (and RegionServerServices) were also exposed to
> Coprocessors. The idea was that CPs could ask-for particular Master
> functions via MasterService. In this role MasterServices was meant to
> constrain what CPs had access to.
>
> MasterServices therefore had two consumers; one internal, the other not.
>
> With time, MasterServices got fat as internal Services and Managers needed
> more of HMaster. Everytime we added to MasterServices, CPs got access.
>
> On survey as part of the recent HBASE-12260 work, it turns out that the
> bulk of the methods in MasterServices are actually annotated
> InterfaceAudience.Private; i.e. for internal use only, not for
> Coprocessors. A brutal purge of Private audience objects, makes for a
> MasterServices we can pass Coprocessors but Coprocessors now have much less
> facility available (for parts, there are alternatives; Andy review suggests
> that CPs are severely crimped if this patch goes in). But MasterServices
> can no longer be used for its original purpose, passing Services and
> Managers a subset of HMaster. The latter brings on a substantial refactor.
>
> Another example of the double-role problem outlined above was found by Duo
> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
> similar tangle. There was the RegionServerServices <=> MasterServices case
> but also the exposure of HRegion internals. In this latter, Region was
> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
> Coprocessors. Subsequently, we all confused his original intent and went
> ahead and thought of Region (as opposed to HRegion) as an Interface for
> HRegion and plumbed it throughout the code base in place of explicit
> HRegion references. As Region picked up functionality, Coprocessors gained
> more access.
>
> The refactoring pattern that has emerged out of the RegionServer-side
> refactoring (which is ahead of the Master-work), is that we move to use the
> HRegion implementation everywhere internally, undoing use of Region
> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
> following suit on the Master side moving to use HMaster in place of
> MasterServices in all launched Services and Managers.
>
> How do we avoid this mistake in future? Should we rename Region as
> CoprocessorRegion so it more plain that its consumer is Coprocessors? Ditto
> on MasterServices?
>
> Thanks,
> S
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk