You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by ga...@apache.org on 2014/06/27 20:42:06 UTC
git commit: HBASE-11285 Expand coprocessor info in ref guide (Misty
Stanley-Jones)
Repository: hbase
Updated Branches:
refs/heads/master 7c1135b3e -> bf827a033
HBASE-11285 Expand coprocessor info in ref guide (Misty Stanley-Jones)
Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/bf827a03
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/bf827a03
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/bf827a03
Branch: refs/heads/master
Commit: bf827a0331fe3294a11b5f0257093110503c3efc
Parents: 7c1135b
Author: Gary Helmling <ga...@apache.org>
Authored: Fri Jun 27 11:11:40 2014 -0700
Committer: Gary Helmling <ga...@apache.org>
Committed: Fri Jun 27 11:11:40 2014 -0700
----------------------------------------------------------------------
src/main/docbkx/cp.xml | 380 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 376 insertions(+), 4 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/hbase/blob/bf827a03/src/main/docbkx/cp.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/cp.xml b/src/main/docbkx/cp.xml
index 0dac7b7..9cc0859 100644
--- a/src/main/docbkx/cp.xml
+++ b/src/main/docbkx/cp.xml
@@ -29,9 +29,381 @@
*/
-->
<title>Apache HBase Coprocessors</title>
- <para>The idea of HBase coprocessors was inspired by Google's BigTable coprocessors. The <link
- xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction">Apache HBase Blog
- on Coprocessor</link> is a very good documentation on that. It has detailed information about
- the coprocessor framework, terminology, management, and so on. </para>
+ <para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable<footnote>
+ <para><link
+ xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009" />, pages
+ 66-67.</para>
+ </footnote>. Coprocessors function in a similar way to Linux kernel modules. They provide a way
+ to run server-level code against locally-stored data. The functionality they provide is very
+ powerful, but also carries great risk and can have adverse effects on the system, at the level
+ of the operating system. The information in this chapter is primarily sourced and heavily reused
+ from Mingjie Lai's blog post at <link
+ xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction" />. </para>
+ <para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
+ need to add specialized functionality to HBase. One example of the use of coprocessors is
+ pluggable compaction and scan policies, which are provided as coprocessors in <link
+ xlink:href="HBASE-6427">HBASE-6427</link>. </para>
+
+ <section>
+ <title>Coprocessor Framework</title>
+ <para>The implementation of HBase coprocessors diverges from the BigTable implementation. The
+ HBase framework provides a library and runtime environment for executing user code within the
+ HBase region server and master processes. </para>
+ <para> The framework API is provided in the <link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html">coprocessor</link>
+ package.</para>
+ <para>Two different types of coprocessors are provided by the framework, based on their
+ scope.</para>
+ <variablelist>
+ <title>Types of Coprocessors</title>
+ <varlistentry>
+ <term>System Coprocessors</term>
+ <listitem>
+ <para>System coprocessors are loaded globally on all tables and regions hosted by a region
+ server.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Table Coprocessors</term>
+ <listitem>
+ <para>You can specify which coprocessors should be loaded on all regions for a table on a
+ per-table basis.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>The framework provides two different aspects of extensions as well:
+ <firstterm>observers</firstterm> and <firstterm>endpoints</firstterm>.</para>
+ <variablelist>
+ <varlistentry>
+ <term>Observers</term>
+ <listitem>
+ <para>Observers are analogous to triggers in conventional databases. They allow you to
+ insert user code by overriding upcall methods provided by the coprocessor framework.
+ Callback functions are executed from core HBase code when events occur. Callbacks are
+ handled by the framework, and the coprocessor itself only needs to insert the extended
+ or alternate functionality.</para>
+ <variablelist>
+ <title>Provided Observer Interfaces</title>
+ <varlistentry>
+ <term><link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html">RegionObserver</link></term>
+ <listitem>
+ <para>A RegionObserver provides hooks for data manipulation events, such as Get,
+ Put, Delete, Scan. An instance of a RegionObserver coprocessor exists for each
+ table region. The scope of the observations a RegionObserver can make is
+ constrained to that region. </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html">RegionServerObserver</link></term>
+ <listitem>
+ <para>A RegionServerObserver provides for operations related to the RegionServer,
+ such as stopping the RegionServer and performing operations before or after
+ merges, commits, or rollbacks.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><link
+ xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html">WALObserver</link></term>
+ <listitem>
+ <para>A WALObserver provides hooks for operations related to the write-ahead log
+ (WAL). You can observe or intercept WAL writing and reconstruction events. A
+ WALObserver runs in the context of WAL processing. A single WALObserver exists on
+ a single region server.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html">MasterObserver</link></term>
+ <listitem>
+ <para>A MasterObserver provides hooks for DDL-type operation, such as create,
+ delete, modify table. The MasterObserver runs within the context of the HBase
+ master. </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>More than one observer of a given type can be loaded at once. Multiple observers are
+ chained to execute sequentially by order of assigned priority. Nothing prevents a
+ coprocessor implementor from communicating internally among its installed
+ observers.</para>
+ <para>An observer of a higher priority can preempt lower-priority observers by throwing an
+ IOException or a subclass of IOException.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Endpoints (HBase 0.96.x and later)</term>
+ <listitem>
+ <para>The implementation for endpoints changed significantly in HBase 0.96.x due to the
+ introduction of protocol buffers (protobufs) (<link
+ xlink:href="https://issues.apache.org/jira/browse/HBASE-5448">HBASE-5488</link>). If
+ you created endpoints before 0.96.x, you will need to rewrite them. Endpoints are now
+ defined and callable as protobuf services, rather than endpoint invocations passed
+ through as Writable blobs</para>
+ <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
+ time from the client. When it is invoked, it is executed remotely at the target region
+ or regions, and results of the executions are returned to the client.</para>
+ <para>The endpoint implementation is installed on the server and is invoked using HBase
+ RPC. The client library provides convenience methods for invoking these dynamic
+ interfaces. </para>
+ <para>An endpoint, like an observer, can communicate with any installed observers. This
+ allows you to plug new features into HBase without modifying or recompiling HBase
+ itself.</para>
+ <itemizedlist>
+ <title>Steps to Implement an Endpoint</title>
+ <listitem><para>Define the coprocessor service and related messages in a <filename>.proto</filename> file</para></listitem>
+ <listitem><para>Run the <command>protoc</command> command to generate the code.</para></listitem>
+ <listitem><para>Write code to implement the following:</para>
+ <itemizedlist>
+ <listitem><para>the generated protobuf Service interface</para></listitem>
+ <listitem>
+ <para>the new <link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#coprocessorService(byte[])">org.apache.hadoop.hbase.coprocessor.CoprocessorService</link>
+ interface (required for the <link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.html">RegionCoprocessorHost</link>
+ to register the exposed service)</para></listitem>
+ </itemizedlist>
+ </listitem>
+ <listitem><para>The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.</para></listitem>
+ </itemizedlist>
+
+ <para>For more information and examples, refer to the API documentation for the <link
+ xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/coprocessor/package-summary.html">coprocessor</link>
+ package, as well as the included RowCount example in the
+ <filename>/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/</filename>
+ of the HBase source.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>Endpoints (HBase 0.94.x and earlier)</term>
+ <listitem>
+ <para>Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any
+ time from the client. When it is invoked, it is executed remotely at the target region
+ or regions, and results of the executions are returned to the client.</para>
+ <para>The endpoint implementation is installed on the server and is invoked using HBase
+ RPC. The client library provides convenience methods for invoking these dynamic
+ interfaces. </para>
+ <para>An endpoint, like an observer, can communicate with any installed observers. This
+ allows you to plug new features into HBase without modifying or recompiling HBase
+ itself.</para>
+ <itemizedlist>
+ <title>Steps to Implement an Endpoint</title>
+ <listitem>
+ <bridgehead>Server-Side</bridgehead>
+ <itemizedlist>
+ <listitem>
+ <para>Create new protocol interface which extends CoprocessorProtocol.</para>
+ </listitem>
+ <listitem>
+ <para>Implement the Endpoint interface. The implementation will be loaded into and
+ executed from the region context.</para>
+ </listitem>
+ <listitem>
+ <para>Extend the abstract class BaseEndpointCoprocessor. This convenience class
+ hides some internal details that the implementer does not need to be concerned
+ about, ˆ such as coprocessor framework class loading.</para>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+ <listitem>
+ <bridgehead>Client-Side</bridgehead>
+ <para>Endpoint can be invoked by two new HBase client APIs:</para>
+ <itemizedlist>
+ <listitem>
+ <para><code>HTableInterface.coprocessorProxy(Class<T> protocol, byte[]
+ row)</code> for executing against a single region</para>
+ </listitem>
+ <listitem>
+ <para><code>HTableInterface.coprocessorExec(Class<T> protocol, byte[]
+ startKey, byte[] endKey, Batch.Call<T,R> callable)</code> for executing
+ over a range of regions</para>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section>
+ <title>Examples</title>
+ <para>An example of an observer is included in
+ <filename>hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java</filename>.
+ Several endpoint examples are included in the same directory.</para>
+ </section>
+
+
+
+ <section>
+ <title>Building A Coprocessor</title>
+
+ <para>Before you can build a processor, it must be developed, compiled, and packaged in a JAR
+ file. The next step is to configure the coprocessor framework to use your coprocessor. You can
+ load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase,
+ or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is
+ loaded dynamically when the table is opened or reopened.</para>
+ <section>
+ <title>Load from Configuration</title>
+ <para> To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's
+ <filename>hbase-site.xml</filename> and configure one of the following properties, based
+ on the type of observer you are configuring: </para>
+ <itemizedlist>
+ <listitem>
+ <para><code>hbase.coprocessor.region.classes</code>for RegionObservers and
+ Endpoints</para>
+ </listitem>
+ <listitem>
+ <para><code>hbase.coprocessor.wal.classes</code>for WALObservers</para>
+ </listitem>
+ <listitem>
+ <para><code>hbase.coprocessor.master.classes</code>for MasterObservers</para>
+ </listitem>
+ </itemizedlist>
+ <example>
+ <title>Example RegionObserver Configuration</title>
+ <para>In this example, one RegionObserver is configured for all the HBase tables.</para>
+ <screen><![CDATA[
+<property>
+ <name>hbase.coprocessor.region.classes</name>
+ <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
+ </property> ]]>
+ </screen>
+ </example>
+
+ <para> If multiple classes are specified for loading, the class names must be comma-separated.
+ The framework attempts to load all the configured classes using the default class loader.
+ Therefore, the jar file must reside on the server-side HBase classpath.</para>
+
+ <para>Coprocessors which are loaded in this way will be active on all regions of
+ all tables. These are the system coprocessor introduced earlier. The first listed
+ coprocessors will be assigned the priority <literal>Coprocessor.Priority.SYSTEM</literal>.
+ Each subsequent coprocessor in the list will have its priority value incremented by one
+ (which reduces its priority, because priorities have the natural sort order of Integers). </para>
+ <para>When calling out to registered observers, the framework executes their callbacks methods
+ in the sorted order of their priority. Ties are broken arbitrarily.</para>
+ </section>
+
+ <section>
+ <title>Load from the HBase Shell</title>
+ <para> You can load a coprocessor on a specific table via a table attribute. The following
+ example will load the <systemitem>FooRegionObserver</systemitem> observer when table
+ <systemitem>t1</systemitem> is read or re-read. </para>
+ <example>
+ <title>Load a Coprocessor On a Table Using HBase Shell</title>
+ <screen>
+hbase(main):005:0> <userinput>alter 't1', METHOD => 'table_att',
+ 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'</userinput>
+<computeroutput>Updating all regions with the new schema...
+1/1 regions updated.
+Done.
+0 row(s) in 1.0730 seconds</computeroutput>
+
+hbase(main):006:0> <userinput>describe 't1'</userinput>
+<computeroutput>DESCRIPTION ENABLED
+ {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false
+ nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B
+ LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
+ => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>
+ '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ
+ E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO
+ CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',
+ BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'
+ , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'
+ , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY
+ => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
+1 row(s) in 0.0190 seconds</computeroutput>
+ </screen>
+ </example>
+
+ <para>The coprocessor framework will try to read the class information from the coprocessor
+ table attribute value. The value contains four pieces of information which are separated by
+ the <literal>|</literal> character.</para>
+
+ <itemizedlist>
+ <listitem>
+ <para>File path: The jar file containing the coprocessor implementation must be in a
+ location where all region servers can read it. You could copy the file onto the local
+ disk on each region server, but it is recommended to store it in HDFS.</para>
+ </listitem>
+ <listitem>
+ <para>Class name: The full class name of the coprocessor.</para>
+ </listitem>
+ <listitem>
+ <para>Priority: An integer. The framework will determine the execution sequence of all
+ configured observers registered at the same hook using priorities. This field can be
+ left blank. In that case the framework will assign a default priority value.</para>
+ </listitem>
+ <listitem>
+ <para>Arguments: This field is passed to the coprocessor implementation.</para>
+ </listitem>
+ </itemizedlist>
+ <example>
+ <title>Unload a Coprocessor From a Table Using HBase Shell</title>
+ <screen>
+hbase(main):007:0> <userinput>alter 't1', METHOD => 'table_att_unset',</userinput>
+hbase(main):008:0* <userinput>NAME => 'coprocessor$1'</userinput>
+<computeroutput>Updating all regions with the new schema...
+1/1 regions updated.
+Done.
+0 row(s) in 1.1130 seconds</computeroutput>
+
+hbase(main):009:0> <userinput>describe 't1'</userinput>
+<computeroutput>DESCRIPTION ENABLED
+ {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false
+ 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION
+ S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214
+ 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN
+ _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true
+ '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
+ 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
+ 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C
+ ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO
+ DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
+1 row(s) in 0.0180 seconds </computeroutput>
+ </screen>
+ </example>
+ <warning>
+ <para>There is no guarantee that the framework will load a given coprocessor successfully.
+ For example, the shell command neither guarantees a jar file exists at a particular
+ location nor verifies whether the given class is actually contained in the jar file.
+ </para>
+ </warning>
+ </section>
+ </section>
+ <section>
+ <title>Check the Status of a Coprocessor</title>
+ <para>To check the status of a coprocessor after it has been configured, use the
+ <command>status</command> HBase Shell command.</para>
+ <screen>
+hbase(main):020:0> <userinput>status 'detailed'</userinput>
+<computeroutput>version 0.92-tm-6
+0 regionsInTransition
+master coprocessors: []
+1 live servers
+ localhost:52761 1328082515520
+ requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
+ -ROOT-,,0
+ numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
+storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
+ .META.,,1
+ numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
+storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
+ t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
+ numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
+storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
+totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN,
+coprocessors=[AggregateImplementation]
+0 dead servers </computeroutput>
+ </screen>
+ </section>
+ <section>
+ <title>Status of Coprocessors in HBase</title>
+ <para> Coprocessors and the coprocessor framework are evolving rapidly and work is ongoing on
+ several different JIRAs. </para>
+ </section>
</chapter>