You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/12/27 20:50:14 UTC
[jira] [Comment Edited] (HBASE-11125) Introduce a higher level interface for registering interest in coprocessor upcalls

    [ https://issues.apache.org/jira/browse/HBASE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259470#comment-14259470 ] 

Andrew Purtell edited comment on HBASE-11125 at 12/27/14 7:49 PM:
------------------------------------------------------------------

The core principle of the current coprocessor API is minimization of overhead. We have a “kernel hook” API where execution of extension code takes place in the current thread to avoid a context switch and copying, using low level types to avoid translation costs, allocations and copying. This is why the current API has been successful, and we want to retain it, but as a result of this choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be unrelated to the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by HBASE-4047.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit together coprocessors and plugins.

Extensions would have a method called at load time that returns a list of objects for which their types express intentions. Intention types would be fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a _xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, _xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete, DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split policy, etc. 
- Endpoint

A naive implementation would maintain lists of intentions at various hook points. For each operation perhaps several lists would need to be walked and processed in turn. I think we can do better and maintain the performance of the current API.

An Extension ClassLoader could generate code for wiring up intentions to low level hooks or plugin sites. For example if we have several intentions that map to RegionObserver methods, we would codegen a BaseRegionObserver subclass, folding in bytecode of the intentions, and install it. Or if we find intention to override split policy, we would codegen a delegating split policy implementation, folding in the bytecode of the intention, delegating everything else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in the collection of intent types for the higher level API to be useful. We should start with straightforward cases and then extend it over time. Consider RegionObserver#preBatchMutate. We don't want to expose MiniBatchOperationInProgress. Too tied into low level details of how the regionserver processes batch RPCs. Instead, we'd collect intentions scoped narrowly to mutation types (Append, Increment, Put) and synthesize a hook for preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We might want to combine Get and Delete intentions into a synthetic hook for preCheckAndDelete, but not have an explicit CheckAndDelete intention, which exposes a RPC detail. Design for different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For example, a PutTransformer would be installed as a RegionObserver with an implemented prePut method. This is what prePut hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type and have a reference to a context if it needs to be stateful. We can carefully add state to the intention type for controlling durability. We should have a separate intention for modifying WALEdits. We can do this without leaking out the WALEdit type. Yet the "transformer" code would run in a prePut hook and get good performance. We could even change the signature of RegionObserver#prePut at any time, provided the code generator that maps intentions to low level implementation is updated likewise (setting aside other considerations for the moment).

We would aim for code generation that can be maintained by committers not experts in JVM internals. That said, some complexity is unavoidable. I think the promise of composability of fine grained intentions, API-level supportability of hiding internal types, and the implied performance of “inlining” intentions into straight line code for low level hooks could be well worth it. We can mitigate maintenance risks by placing the Extension API and code generator into its own Maven module. This module would provide a system level coprocessor that must be installed via site configuration for experimental “Extension” API support. It would be optional and decoupled from the client and server core modules. 

Because we are keeping the low level "kernel-hook"-style API the lack of access to internal types and lack of functional coverage in a higher level API wouldn't be a problem. An implementor could always resort to direct use of low level interfaces. Of course we would want to figure out how to implement the desired extension in higher level terms.


was (Author: apurtell):
The core principle of the current coprocessor API is minimization of overhead. We have a “kernel hook” API where execution of extension code takes place in the current thread to avoid a context switch and copying, using low level types to avoid translation costs, allocations and copying. This is why the current API has been successful, and we want to retain it, but as a result of this choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are exposed to coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be unrelated to the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by HBASE-4147.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit together coprocessors and plugins.

Extensions would have a method called at load time that returns a list of objects for which their types express intentions. Intention types would be fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either globally or on a per-table basis
- A request to intercept an event (read with possible modification or drop), a _xxx_Transformer, either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, _xxx_Listener for post hooks, e.g. DeleteTransformer -> preDelete, DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split policy, etc. 
- Endpoint

A naive implementation would maintain lists of intentions at various hook points. For each operation perhaps several lists would need to be walked and processed in turn. I think we can do better and maintain the performance of the current API.

An Extension ClassLoader could generate code for wiring up intentions to low level hooks or plugin sites. For example if we have several intentions that map to RegionObserver methods, we would codegen a BaseRegionObserver subclass, folding in bytecode of the intentions, and install it. Or if we find intention to override split policy, we would codegen a delegating split policy implementation, folding in the bytecode of the intention, delegating everything else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in the collection of intent types for the higher level API to be useful. We should start with straightforward cases and then extend it over time. Consider RegionObserver#preBatchMutate. We don't want to expose MiniBatchOperationInProgress. Too tied into low level details of how the regionserver processes batch RPCs. Instead, we'd collect intentions scoped narrowly to mutation types (Append, Increment, Put) and synthesize a hook for preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete. We might want to combine Get and Delete intentions into a synthetic hook for preCheckAndDelete, but not have an explicit CheckAndDelete intention, which exposes a RPC detail. Design for different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For example, a PutTransformer would be installed as a RegionObserver with an implemented prePut method. This is what prePut hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type and have a reference to a context if it needs to be stateful. We can carefully add state to the intention type for controlling durability. We should have a separate intention for modifying WALEdits. We can do this without leaking out the WALEdit type. Yet the "transformer" code would run in a prePut hook and get good performance. We could even change the signature of RegionObserver#prePut at any time, provided the code generator that maps intentions to low level implementation is updated likewise (setting aside other considerations for the moment).

We would aim for code generation that can be maintained by committers not experts in JVM internals. That said, some complexity is unavoidable. I think the promise of composability of fine grained intentions, API-level supportability of hiding internal types, and the implied performance of “inlining” intentions into straight line code for low level hooks could be well worth it. We can mitigate maintenance risks by placing the Extension API and code generator into its own Maven module. This module would provide a system level coprocessor that must be installed via site configuration for experimental “Extension” API support. It would be optional and decoupled from the client and server core modules. 

Because we are keeping the low level "kernel-hook"-style API the lack of access to internal types and lack of functional coverage in a higher level API wouldn't be a problem. An implementor could always resort to direct use of low level interfaces. Of course we would want to figure out how to implement the desired extension in higher level terms.

> Introduce a higher level interface for registering interest in coprocessor upcalls
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-11125
>                 URL: https://issues.apache.org/jira/browse/HBASE-11125
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Critical
>
> We should introduce a higher level interface for managing the registration of 'user' code for execution from the low level hooks. It should not be necessary for coprocessor implementers to learn the universe of available low level hooks and the subtleties of their placement within HBase core code. Instead the higher level API should allow the implementer to describe their intent and then this API should choose the appropriate low level hook placement.
> A very desirable side effect is a layer of indirection between coprocessor implementers and the actual hooks. This will address the perennial complaint that the low level hooks change too much from release to release, as recently discussed during the RM panel at HBaseCon. If we try to avoid changing the particular placement and arguments of hook functions in response to those complaints, this can be an onerous constraint on necessary internals evolution. Instead we can direct coprocessor implementers to consider the new API and provide the same interface stability guarantees there as we do for client API, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)