You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by an...@apache.org on 2016/10/26 20:07:35 UTC
[6/8] hbase git commit: HBASE-15347 updated asciidoc for 1.3

http://git-wip-us.apache.org/repos/asf/hbase/blob/6cb8a436/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc
index 01f2eb7..b4c39c8 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -28,7 +28,9 @@
 :experimental:
 
 This chapter expands upon the <<getting_started>> chapter to further explain configuration of Apache HBase.
-Please read this chapter carefully, especially the <<basic.prerequisites,Basic Prerequisites>> to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
+Please read this chapter carefully, especially the <<basic.prerequisites,Basic Prerequisites>>
+to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
+Familiarize yourself with <<hbase_supported_tested_definitions>> as well.
 
 == Configuration Files
 Apache HBase uses the same configuration system as Apache Hadoop.
@@ -98,6 +100,22 @@ This section lists required services and some required system configuration.
 |JDK 7
 |JDK 8
 
+|2.0
+|link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
+|link:http://search-hadoop.com/m/YGbbsPxZ723m3as[Not Supported]
+|yes
+
+|1.3
+|link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
+|yes
+|yes
+
+
+|1.2
+|link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
+|yes
+|yes
+
 |1.1
 |link:http://search-hadoop.com/m/DHED4Zlz0R1[Not Supported]
 |yes
@@ -116,11 +134,6 @@ deprecated `remove()` method of the `PoolMap` class and is under consideration.
 link:https://issues.apache.org/jira/browse/HBASE-7608[HBASE-7608] for more information about JDK 8
 support.
 
-|0.96
-|yes
-|yes
-|N/A
-
 |0.94
 |yes
 |yes
@@ -129,6 +142,7 @@ support.
 
 NOTE: In HBase 0.98.5 and newer, you must set `JAVA_HOME` on each node of your cluster. _hbase-env.sh_ provides a handy mechanism to do this.
 
+[[os]]
 .Operating System Utilities
 ssh::
   HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
@@ -143,6 +157,7 @@ Loopback IP::
 NTP::
   The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism, on your cluster, and that all nodes look to the same service for time synchronization. See the link:http://www.tldp.org/LDP/sag/html/basic-ntp-config.html[Basic NTP Configuration] at [citetitle]_The Linux Documentation Project (TLDP)_ to set up NTP.
 
+[[ulimit]]
 Limits on Number of Files and Processes (ulimit)::
   Apache HBase is a database. It requires the ability to open a large number of files at once. Many Linux distributions limit the number of files a single user is allowed to open to `1024` (or `256` on older versions of OS X). You can check this limit on your servers by running the command `ulimit -n` when logged in as the user which runs HBase. See <<trouble.rs.runtime.filehandles,the Troubleshooting section>> for some of the problems you may experience if the limit is too low. You may also notice errors such as the following:
 +
@@ -162,7 +177,7 @@ For example, assuming that a schema had 3 ColumnFamilies per region with an aver
 +
 Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS issues thread on the hbase-users mailing list, from 2011.
 +
-Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration Parameters: What can you just ignore?
+Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on your hadoop cluster is Aaron Kimball's Configuration Parameters: What can you just ignore?
 +
 .`ulimit` Settings on Ubuntu
 ====
@@ -210,24 +225,39 @@ Use the following legend to interpret this table:
 * "X" = not supported
 * "NT" = Not tested
 
-[cols="1,1,1,1,1,1,1", options="header"]
+[cols="1,1,1,1,1,1,1,1", options="header"]
 |===
-| | HBase-0.92.x | HBase-0.94.x | HBase-0.96.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x
-|Hadoop-0.20.205 | S | X | X | X | X | X
-|Hadoop-0.22.x | S | X | X | X | X | X
-|Hadoop-1.0.x  |X | X | X | X | X | X
-|Hadoop-1.1.x | NT | S | S | NT | X | X
-|Hadoop-0.23.x | X | S | NT | X | X | X
-|Hadoop-2.0.x-alpha | X | NT | X | X | X | X
-|Hadoop-2.1.0-beta | X | NT | S | X | X | X
-|Hadoop-2.2.0 | X | NT | S | S | NT | NT
-|Hadoop-2.3.x | X | NT | S | S | NT | NT
-|Hadoop-2.4.x | X | NT | S | S | S | S
-|Hadoop-2.5.x | X | NT | S | S | S | S
-|Hadoop-2.6.x | X | NT | NT | NT | S | S
-|Hadoop-2.7.x | X | NT | NT | NT | NT | NT
+| | HBase-0.94.x | HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) | HBase-1.0.x (Hadoop 1.x is NOT supported) | HBase-1.1.x | HBase-1.2.x | HBase-1.3.x | HBase-2.0.x
+|Hadoop-1.0.x  | X | X | X | X | X | X | X
+|Hadoop-1.1.x | S | NT | X | X | X | X | X
+|Hadoop-0.23.x | S | X | X | X | X | X | X
+|Hadoop-2.0.x-alpha | NT | X | X | X | X | X | X
+|Hadoop-2.1.0-beta | NT | X | X | X | X | X | X
+|Hadoop-2.2.0 | NT | S | NT | NT | X  | X | X
+|Hadoop-2.3.x | NT | S | NT | NT | X  | X | X
+|Hadoop-2.4.x | NT | S | S | S | S | S | X
+|Hadoop-2.5.x | NT | S | S | S | S | S | X
+|Hadoop-2.6.0 | X | X | X | X | X | X | X
+|Hadoop-2.6.1+ | NT | NT | NT | NT | S | S | S
+|Hadoop-2.7.0 | X | X | X | X | X | X | X
+|Hadoop-2.7.1+ | NT | NT | NT | NT | S | S | S
 |===
 
+.Hadoop 2.6.x
+[TIP]
+====
+Hadoop distributions based on the 2.6.x line *must* have
+link:https://issues.apache.org/jira/browse/HADOOP-11710[HADOOP-11710] applied if you plan to run
+HBase on top of an HDFS Encryption Zone. Failure to do so will result in cluster failure and
+data loss. This patch is present in Apache Hadoop releases 2.6.1+.
+====
+
+.Hadoop 2.7.x
+[TIP]
+====
+Hadoop version 2.7.0 is not tested or supported as the Hadoop PMC has explicitly labeled that release as not being stable.
+====
+
 .Replace the Hadoop Bundled With HBase!
 [NOTE]
 ====
@@ -312,26 +342,7 @@ Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop. See link:h
 [[hadoop.older.versions]]
 ==== Hadoop versions 0.20.x - 1.x
 
-HBase will lose data unless it is running on an HDFS that has a durable `sync` implementation.
-DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO NOT have this attribute.
-Currently only Hadoop versions 0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have a working, durable sync.
-The Cloudera blog post link:http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/[An
-            update on Apache Hadoop 1.0] by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
-It's worth checking out if you are having trouble making sense of the Hadoop version morass.
-
-Sync has to be explicitly enabled by setting `dfs.support.append` equal to true on both the client side -- in _hbase-site.xml_ -- and on the serverside in _hdfs-site.xml_ (The sync facility HBase needs is a subset of the append code path).
-
-[source,xml]
-----
-
-<property>
-  <name>dfs.support.append</name>
-  <value>true</value>
-</property>
-----
-
-You will have to restart your cluster after making this edit.
-Ignore the chicken-little comment you'll find in the _hdfs-default.xml_ in the description for the `dfs.support.append` configuration.
+DO NOT use Hadoop versions older than 2.2.0 for HBase versions greater than 1.0. Check release documentation if you are using an older version of HBase for Hadoop related information. 
 
 [[hadoop.security]]
 ==== Apache HBase on Secure Hadoop
@@ -373,8 +384,9 @@ See also <<casestudies.max.transfer.threads,casestudies.max.transfer.threads>> a
 === ZooKeeper Requirements
 
 ZooKeeper 3.4.x is required as of HBase 1.0.0.
-HBase makes use of the `multi` functionality that is only available since 3.4.0 (The `useMulti` configuration option defaults to `true` in HBase 1.0.0).
-See link:https://issues.apache.org/jira/browse/HBASE-12241[HBASE-12241 (The crash of regionServer when taking deadserver's replication queue breaks replication)] and link:https://issues.apache.org/jira/browse/HBASE-6775[HBASE-6775 (Use ZK.multi when available for HBASE-6710 0.92/0.94 compatibility fix)] for background.
+HBase makes use of the `multi` functionality that is only available since Zookeeper 3.4.0. The `hbase.zookeeper.useMulti` configuration property defaults to `true` in HBase 1.0.0.
+Refer to link:https://issues.apache.org/jira/browse/HBASE-12241[HBASE-12241 (The crash of regionServer when taking deadserver's replication queue breaks replication)] and link:https://issues.apache.org/jira/browse/HBASE-6775[HBASE-6775 (Use ZK.multi when available for HBASE-6710 0.92/0.94 compatibility fix)] for background.
+The property is deprecated and useMulti is always enabled in HBase 2.0.
 
 [[standalone_dist]]
 == HBase run modes: Standalone and Distributed
@@ -392,11 +404,12 @@ Set [var]+JAVA_HOME+ to point at the root of your +java+ install.
 This is the default mode.
 Standalone mode is what is described in the <<quickstart,quickstart>> section.
 In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM.
-Zookeeper binds to a well known port so clients may talk to HBase.
+ZooKeeper binds to a well known port so clients may talk to HBase.
 
+[[distributed]]
 === Distributed
 
-Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
+Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a. _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
 The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
 
 Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
@@ -433,7 +446,7 @@ In addition, the cluster is configured so that multiple cluster nodes enlist as
 These configuration basics are all demonstrated in <<quickstart_fully_distributed,quickstart-fully-distributed>>.
 
 .Distributed RegionServers
-Typically, your cluster will contain multiple RegionServers all running on different servers, as well as primary and backup Master and Zookeeper daemons.
+Typically, your cluster will contain multiple RegionServers all running on different servers, as well as primary and backup Master and ZooKeeper daemons.
 The _conf/regionservers_ file on the master server contains a list of hosts whose RegionServers are associated with this cluster.
 Each host is on a separate line.
 All hosts listed in this file will have their RegionServer processes started and stopped when the master server starts or stops.
@@ -526,7 +539,7 @@ HBase logs can be found in the _logs_ subdirectory.
 Check them out especially if HBase had trouble starting.
 
 HBase also puts up a UI listing vital attributes.
-By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at _http://master.example.org:16010_ to see the web interface.
+By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at pass:[http://master.example.org:16010] to see the web interface.
 
 Prior to HBase 0.98 the master UI was deployed on port 60010, and the HBase RegionServers UI on port 60030.
 
@@ -550,7 +563,7 @@ If you are running a distributed operation, be sure to wait until HBase has shut
 === _hbase-site.xml_ and _hbase-default.xml_
 
 Just as in Hadoop where you add site-specific HDFS configuration to the _hdfs-site.xml_ file, for HBase, site specific customizations go into the file _conf/hbase-site.xml_.
-For the list of configurable properties, see <<hbase_default_configurations,hbase default configurations>> below or view the raw _hbase-default.xml_ source file in the HBase source code at _src/main/resources_. 
+For the list of configurable properties, see <<hbase_default_configurations,hbase default configurations>> below or view the raw _hbase-default.xml_ source file in the HBase source code at _src/main/resources_.
 
 Not all configuration options make it out to _hbase-default.xml_.
 Configuration that it is thought rare anyone would change can exist only in code; the only way to turn up such configurations is via a reading of the source code itself.
@@ -558,7 +571,7 @@ Configuration that it is thought rare anyone would change can exist only in code
 Currently, changes here will require a cluster restart for HBase to notice the change.
 // hbase/src/main/asciidoc
 //
-include::../../../../target/asciidoc/hbase-default.adoc[]
+include::{docdir}/../../../target/asciidoc/hbase-default.adoc[]
 
 
 [[hbase.env.sh]]
@@ -590,7 +603,7 @@ ZooKeeper is where all these values are kept.
 Thus clients require the location of the ZooKeeper ensemble before they can do anything else.
 Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
 
-If you are configuring an IDE to run a HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests). 
+If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
 
 Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
 [source]
@@ -607,7 +620,7 @@ slf4j-log4j (slf4j-log4j12-1.5.8.jar)
 zookeeper (zookeeper-3.4.2.jar)
 ----
 
-An example basic _hbase-site.xml_ for client only might look as follows: 
+An example basic _hbase-site.xml_ for client only might look as follows:
 [source,xml]
 ----
 <?xml version="1.0"?>
@@ -683,8 +696,8 @@ Below we show what the main configuration files -- _hbase-site.xml_, _regionserv
     <name>hbase.cluster.distributed</name>
     <value>true</value>
     <description>The mode the cluster will be in. Possible values are
-      false: standalone and pseudo-distributed setups with managed Zookeeper
-      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
+      false: standalone and pseudo-distributed setups with managed ZooKeeper
+      true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh)
     </description>
   </property>
 </configuration>
@@ -716,7 +729,7 @@ The following lines in the _hbase-env.sh_ file show how to set the `JAVA_HOME` e
 
 ----
 # The java implementation to use.
-export JAVA_HOME=/usr/java/jdk1.7.0/
+export JAVA_HOME=/usr/java/jdk1.8.0/
 
 # The maximum amount of heap to use. Default is left to JVM default.
 export HBASE_HEAPSIZE=4G
@@ -744,14 +757,7 @@ See link:https://issues.apache.org/jira/browse/HBASE-6389[HBASE-6389 Modify the
             conditions to ensure that Master waits for sufficient number of Region Servers before
             starting region assignments] for more detail.
 
-[[backup.master.fail.fast]]
-==== If a backup Master exists, make the primary Master fail fast
-
-If the primary Master loses its connection with ZooKeeper, it will fall into a loop where it keeps trying to reconnect.
-Disable this functionality if you are running more than one Master: i.e. a backup Master.
-Failing to do so, the dying Master may continue to receive RPCs though another Master has assumed the role of primary.
-See the configuration <<fail.fast.expired.active.master,fail.fast.expired.active.master>>.
-
+[[recommended_configurations]]
 === Recommended Configurations
 
 [[recommended_configurations.zk]]
@@ -903,7 +909,7 @@ See <<master.processes.loadbalancer,master.processes.loadbalancer>> for more inf
 ==== Disabling Blockcache
 
 Do not turn off block cache (You'd do it by setting `hbase.block.cache.size` to zero). Currently we do not do well if you do this because the RegionServer will spend all its time loading HFile indices over and over again.
-If your working set it such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
+If your working set is such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
 
 [[nagles]]
 ==== link:http://en.wikipedia.org/wiki/Nagle's_algorithm[Nagle's] or the small package problem
@@ -916,7 +922,7 @@ You might also see the graphs on the tail of link:https://issues.apache.org/jira
 ==== Better Mean Time to Recover (MTTR)
 
 This section is about configurations that will make servers come back faster after a fail.
-See the Deveraj Das an Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
+See the Deveraj Das and Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
 
 The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
 The below suggested configurations are Varun's suggestions distilled and tested.
@@ -988,7 +994,7 @@ See the link:http://docs.oracle.com/javase/6/docs/technotes/guides/management/ag
 Historically, besides above port mentioned, JMX opens two additional random TCP listening ports, which could lead to port conflict problem. (See link:https://issues.apache.org/jira/browse/HBASE-10289[HBASE-10289] for details)
 
 As an alternative, You can use the coprocessor-based JMX implementation provided by HBase.
-To enable it in 0.99 or above, add below property in _hbase-site.xml_: 
+To enable it in 0.99 or above, add below property in _hbase-site.xml_:
 
 [source,xml]
 ----
@@ -1019,7 +1025,7 @@ The registry port can be shared with connector port in most cases, so you only n
 However if you want to use SSL communication, the 2 ports must be configured to different values.
 
 By default the password authentication and SSL communication is disabled.
-To enable password authentication, you need to update _hbase-env.sh_          like below: 
+To enable password authentication, you need to update _hbase-env.sh_          like below:
 [source,bash]
 ----
 export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.authenticate=true                  \
@@ -1046,7 +1052,7 @@ keytool -export -alias jconsole -keystore myKeyStore -file jconsole.cert
 keytool -import -alias jconsole -keystore jconsoleKeyStore -file jconsole.cert
 ----
 
-And then update _hbase-env.sh_ like below: 
+And then update _hbase-env.sh_ like below:
 
 [source,bash]
 ----
@@ -1068,12 +1074,12 @@ Finally start `jconsole` on the client using the key store:
 jconsole -J-Djavax.net.ssl.trustStore=/home/tianq/jconsoleKeyStore
 ----
 
-NOTE: To enable the HBase JMX implementation on Master, you also need to add below property in _hbase-site.xml_: 
+NOTE: To enable the HBase JMX implementation on Master, you also need to add below property in _hbase-site.xml_:
 
 [source,xml]
 ----
 <property>
-  <ame>hbase.coprocessor.master.classes</name>
+  <name>hbase.coprocessor.master.classes</name>
   <value>org.apache.hadoop.hbase.JMXListener</value>
 </property>
 ----

http://git-wip-us.apache.org/repos/asf/hbase/blob/6cb8a436/src/main/asciidoc/_chapters/cp.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc
index a99e903..1817dd3 100644
--- a/src/main/asciidoc/_chapters/cp.adoc
+++ b/src/main/asciidoc/_chapters/cp.adoc
@@ -27,203 +27,796 @@
 :icons: font
 :experimental:
 
-HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, pages 66-67.). Coprocessors function in a similar way to Linux kernel modules.
-They provide a way to run server-level code against locally-stored data.
-The functionality they provide is very powerful, but also carries great risk and can have adverse effects on the system, at the level of the operating system.
-The information in this chapter is primarily sourced and heavily reused from Mingjie Lai's blog post at https://blogs.apache.org/hbase/entry/coprocessor_introduction.
+HBase Coprocessors are modeled after Google BigTable's coprocessor implementation
+(http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.).
 
-Coprocessors are not designed to be used by end users of HBase, but by HBase developers who need to add specialized functionality to HBase.
-One example of the use of coprocessors is pluggable compaction and scan policies, which are provided as coprocessors in link:https://issues.apache.org/jira/browse/HBASE-6427[HBASE-6427].
+The coprocessor framework provides mechanisms for running your custom code directly on
+the RegionServers managing your data. Efforts are ongoing to bridge gaps between HBase's
+implementation and BigTable's architecture. For more information see
+link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
 
-== Coprocessor Framework
+The information in this chapter is primarily sourced and heavily reused from the following
+resources:
 
-The implementation of HBase coprocessors diverges from the BigTable implementation.
-The HBase framework provides a library and runtime environment for executing user code within the HBase region server and master processes.
+. Mingjie Lai's blog post
+link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction].
+. Gaurav Bhardwaj's blog post
+link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors].
 
-The framework API is provided in the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] package.
+[WARNING]
+.Use Coprocessors At Your Own Risk
+====
+Coprocessors are an advanced feature of HBase and are intended to be used by system
+developers only. Because coprocessor code runs directly on the RegionServer and has
+direct access to your data, they introduce the risk of data corruption, man-in-the-middle
+attacks, or other malicious data access. Currently, there is no mechanism to prevent
+data corruption by coprocessors, though work is underway on
+link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
++
+In addition, there is no resource isolation, so a well-intentioned but misbehaving
+coprocessor can severely degrade cluster performance and stability.
+====
 
-Two different types of coprocessors are provided by the framework, based on their scope.
+== Coprocessor Overview
 
-.Types of Coprocessors
+In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
+query. In order to fetch only the relevant data, you filter it using a HBase
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
+, whereas in an RDBMS you use a `WHERE` predicate.
 
-System Coprocessors::
-  System coprocessors are loaded globally on all tables and regions hosted by a region server.
+After fetching the data, you perform computations on it. This paradigm works well
+for "small data" with a few thousand rows and several columns. However, when you scale
+to billions of rows and millions of columns, moving large amounts of data across your
+network will create bottlenecks at the network layer, and the client needs to be powerful
+enough and have enough memory to handle the large amounts of data and the computations.
+In addition, the client code can grow large and complex.
+
+In this scenario, coprocessors might make sense. You can put the business computation
+code into a coprocessor which runs on the RegionServer, in the same location as the
+data, and returns the result to the client.
+
+This is only one scenario where using coprocessors can provide benefit. Following
+are some analogies which may help to explain some of the benefits of coprocessors.
+
+[[cp_analogies]]
+=== Coprocessor Analogies
+
+Triggers and Stored Procedure::
+  An Observer coprocessor is similar to a trigger in a RDBMS in that it executes
+  your code either before or after a specific event (such as a `Get` or `Put`)
+  occurs. An endpoint coprocessor is similar to a stored procedure in a RDBMS
+  because it allows you to perform custom computations on the data on the
+  RegionServer itself, rather than on the client.
 
-Table Coprocessors::
-  You can specify which coprocessors should be loaded on all regions for a table on a per-table basis.
+MapReduce::
+  MapReduce operates on the principle of moving the computation to the location of
+  the data. Coprocessors operate on the same principal.
 
-The framework provides two different aspects of extensions as well: _observers_ and _endpoints_.
+AOP::
+  If you are familiar with Aspect Oriented Programming (AOP), you can think of a coprocessor
+  as applying advice by intercepting a request and then running some custom code,
+  before passing the request on to its final destination (or even changing the destination).
+
+
+=== Coprocessor Implementation Overview
+
+. Either your class should extend one of the Coprocessor classes, such as
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver],
+or it should implement the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor]
+or
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService]
+interface.
+
+. Load the coprocessor, either statically (from the configuration) or dynamically,
+using HBase Shell. For more details see <<cp_loading,Loading Coprocessors>>.
+
+. Call the coprocessor from your client-side code. HBase handles the coprocessor
+transparently.
+
+The framework API is provided in the
+link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor]
+package.
+
+== Types of Coprocessors
+
+=== Observer Coprocessors
+
+Observer coprocessors are triggered either before or after a specific event occurs.
+Observers that happen before an event use methods that start with a `pre` prefix,
+such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]. Observers that happen just after an event override methods that start
+with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`].
+
+
+==== Use Cases for Observer Coprocessors
+Security::
+  Before performing a `Get` or `Put` operation, you can check for permission using
+  `preGet` or `prePut` methods.
+
+Referential Integrity::
+  HBase does not directly support the RDBMS concept of refential integrity, also known
+  as foreign keys. You can use a coprocessor to enforce such integrity. For instance,
+  if you have a business rule that every insert to the `users` table must be followed
+  by a corresponding entry in the `user_daily_attendance` table, you could implement
+  a coprocessor to use the `prePut` method on `user` to insert a record into `user_daily_attendance`.
+
+Secondary Indexes::
+  You can use a coprocessor to maintain secondary indexes. For more information, see
+  link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
+
+
+==== Types of Observer Coprocessor
+
+RegionObserver::
+  A RegionObserver coprocessor allows you to observe events on a region, such as `Get`
+  and `Put` operations. See
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver].
+  Consider overriding the convenience class
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver],
+  which implements the `RegionObserver` interface and will not break if new methods are added.
+
+RegionServerObserver::
+  A RegionServerObserver allows you to observe events related to the RegionServer's
+  operation, such as starting, stopping, or performing merges, commits, or rollbacks.
+  See
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver].
+  Consider overriding the convenience class
+  https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterAndRegionObserver.html[BaseMasterAndRegionObserver]
+  which implements both `MasterObserver` and `RegionServerObserver` interfaces and
+  will not break if new methods are added.
+
+MasterOvserver::
+  A MasterObserver allows you to observe events related to the HBase Master, such
+  as table creation, deletion, or schema modification. See
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver].
+  Consider overriding the convenience class
+  https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseMasterAndRegionObserver.html[BaseMasterAndRegionObserver],
+  which implements both `MasterObserver` and `RegionServerObserver` interfaces and
+  will not break if new methods are added.
+
+WalObserver::
+  A WalObserver allows you to observe events related to writes to the Write-Ahead
+  Log (WAL). See
+  link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
+  Consider overriding the convenience class
+  link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseWALObserver.html[BaseWALObserver],
+  which implements the `WalObserver` interface and will not break if new methods are added.
+
+<<cp_example,Examples>> provides working examples of observer coprocessors.
+
+
+
+[[cpeps]]
+=== Endpoint Coprocessor
+
+Endpoint processors allow you to perform computation at the location of the data.
+See <<cp_analogies, Coprocessor Analogy>>. An example is the need to calculate a running
+average or summation for an entire table which spans hundreds of regions.
+
+In contrast to observer coprocessors, where your code is run transparently, endpoint
+coprocessors must be explicitly invoked using the
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()]
+method available in
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table],
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTableInterface.html[HTableInterface],
+or
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable].
+
+Starting with HBase 0.96, endpoint coprocessors are implemented using Google Protocol
+Buffers (protobuf). For more details on protobuf, see Google's
+link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
+Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later.
+See
+link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]). To upgrade your
+HBase cluster from 0.94 or earlier to 0.96 or later, you need to reimplement your
+coprocessor.
+
+Coprocessor Endpoints should make no use of HBase internals and
+only avail of public APIs; ideally a CPEP should depend on Interfaces
+and data structures only. This is not always possible but beware
+that doing so makes the Endpoint brittle, liable to breakage as HBase
+internals evolve. HBase internal APIs annotated as private or evolving
+do not have to respect semantic versioning rules or general java rules on
+deprecation before removal. While generated protobuf files are
+absent the hbase audience annotations -- they are created by the
+protobuf protoc tool which knows nothing of how HBase works --
+they should be consided `@InterfaceAudience.Private` so are liable to
+change.
+
+<<cp_example,Examples>> provides working examples of endpoint coprocessors.
+
+[[cp_loading]]
+== Loading Coprocessors
+
+To make your coprocessor available to HBase, it must be _loaded_, either statically
+(through the HBase configuration) or dynamically (using HBase Shell or the Java API).
+
+=== Static Loading
+
+Follow these steps to statically load your coprocessor. Keep in mind that you must
+restart HBase to unload a coprocessor that has been loaded statically.
+
+. Define the Coprocessor in _hbase-site.xml_, with a <property> element with a <name>
+and a <value> sub-element. The <name> should be one of the following:
++
+- `hbase.coprocessor.region.classes` for RegionObservers and Endpoints.
+- `hbase.coprocessor.wal.classes` for WALObservers.
+- `hbase.coprocessor.master.classes` for MasterObservers.
++
+<value> must contain the fully-qualified class name of your coprocessor's implementation
+class.
++
+For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
+following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
++
+[source,xml]
+----
+<property>
+    <name>hbase.coprocessor.region.classes</name>
+    <value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value>
+</property>
+----
++
+If multiple classes are specified for loading, the class names must be comma-separated.
+The framework attempts to load all the configured classes using the default class loader.
+Therefore, the jar file must reside on the server-side HBase classpath.
++
+Coprocessors which are loaded in this way will be active on all regions of all tables.
+These are also called system Coprocessor.
+The first listed Coprocessors will be assigned the priority `Coprocessor.Priority.SYSTEM`.
+Each subsequent coprocessor in the list will have its priority value incremented by one (which
+reduces its priority, because priorities have the natural sort order of Integers).
++
+When calling out to registered observers, the framework executes their callbacks methods in the
+sorted order of their priority. +
+Ties are broken arbitrarily.
 
-Observers::
-  Observers are analogous to triggers in conventional databases.
-  They allow you to insert user code by overriding upcall methods provided by the coprocessor framework.
-  Callback functions are executed from core HBase code when events occur.
-  Callbacks are handled by the framework, and the coprocessor itself only needs to insert the extended or alternate functionality.
+. Put your code on HBase's classpath. One easy way to do this is to drop the jar
+  (containing you code and all the dependencies) into the `lib/` directory in the
+  HBase installation.
 
-Endpoints (HBase 0.96.x and later)::
-  The implementation for endpoints changed significantly in HBase 0.96.x due to the introduction of protocol buffers (protobufs) (link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5488]). If you created endpoints before 0.96.x, you will need to rewrite them.
-  Endpoints are now defined and callable as protobuf services, rather than endpoint invocations passed through as Writable blobs
+. Restart HBase.
 
-Endpoints (HBase 0.94.x and earlier)::
-  Dynamic RPC endpoints resemble stored procedures.
-  An endpoint can be invoked at any time from the client.
-  When it is invoked, it is executed remotely at the target region or regions, and results of the executions are returned to the client.
 
-== Examples
+=== Static Unloading
 
-An example of an observer is included in _hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java_.
-Several endpoint examples are included in the same directory.
+. Delete the coprocessor's <property> element, including sub-elements, from `hbase-site.xml`.
+. Restart HBase.
+. Optionally, remove the coprocessor's JAR file from the classpath or HBase's `lib/`
+  directory.
 
-== Building A Coprocessor
 
-Before you can build a processor, it must be developed, compiled, and packaged in a JAR file.
-The next step is to configure the coprocessor framework to use your coprocessor.
-You can load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase, or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is loaded dynamically when the table is opened or reopened.
+=== Dynamic Loading
 
-=== Load from Configuration
+You can also load a coprocessor dynamically, without restarting HBase. This may seem
+preferable to static loading, but dynamically loaded coprocessors are loaded on a
+per-table basis, and are only available to the table for which they were loaded. For
+this reason, dynamically loaded tables are sometimes called *Table Coprocessor*.
 
-To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's _hbase-site.xml_ and configure one of the following properties, based on the type of observer you are configuring:
+In addition, dynamically loading a coprocessor acts as a schema change on the table,
+and the table must be taken offline to load the coprocessor.
 
-* `hbase.coprocessor.region.classes`for RegionObservers and Endpoints
-* `hbase.coprocessor.wal.classes`for WALObservers
-* `hbase.coprocessor.master.classes`for MasterObservers
+There are three ways to dynamically load Coprocessor.
 
-.Example RegionObserver Configuration
+[NOTE]
+.Assumptions
 ====
-In this example, one RegionObserver is configured for all the HBase tables.
+The below mentioned instructions makes the following assumptions:
 
-[source,xml]
+* A JAR called `coprocessor.jar` contains the Coprocessor implementation along with all of its
+dependencies.
+* The JAR is available in HDFS in some location like
+`hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`.
+====
+
+[[load_coprocessor_in_shell]]
+==== Using HBase Shell
+
+. Disable the table using HBase Shell:
++
+[source]
 ----
-<property>
-  <name>hbase.coprocessor.region.classes</name>
-  <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
-</property>
+hbase> disable 'users'
 ----
-====
 
-If multiple classes are specified for loading, the class names must be comma-separated.
-The framework attempts to load all the configured classes using the default class loader.
-Therefore, the jar file must reside on the server-side HBase classpath.
+. Load the Coprocessor, using a command like the following:
++
+[source]
+----
+hbase alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/
+user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
+arg1=1,arg2=2'
+----
++
+The Coprocessor framework will try to read the class information from the coprocessor table
+attribute value.
+The value contains four pieces of information which are separated by the pipe (`|`) character.
++
+* File path: The jar file containing the Coprocessor implementation must be in a location where
+all region servers can read it. +
+You could copy the file onto the local disk on each region server, but it is recommended to store
+it in HDFS. +
+https://issues.apache.org/jira/browse/HBASE-14548[HBASE-14548] allows a directory containing the jars
+or some wildcards to be specified, such as: hdfs://<namenode>:<port>/user/<hadoop-user>/ or
+hdfs://<namenode>:<port>/user/<hadoop-user>/*.jar. Please note that if a directory is specified,
+all jar files(.jar) in the directory are added. It does not search for files in sub-directories.
+Do not use a wildcard if you would like to specify a directory. This enhancement applies to the
+usage via the JAVA API as well.
+* Class name: The full class name of the Coprocessor.
+* Priority: An integer. The framework will determine the execution sequence of all configured
+observers registered at the same hook using priorities. This field can be left blank. In that
+case the framework will assign a default priority value.
+* Arguments (Optional): This field is passed to the Coprocessor implementation. This is optional.
+
+. Enable the table.
++
+----
+hbase(main):003:0> enable 'users'
+----
 
-Coprocessors which are loaded in this way will be active on all regions of all tables.
-These are the system coprocessor introduced earlier.
-The first listed coprocessors will be assigned the priority `Coprocessor.Priority.SYSTEM`.
-Each subsequent coprocessor in the list will have its priority value incremented by one (which reduces its priority, because priorities have the natural sort order of Integers).
+. Verify that the coprocessor loaded:
++
+----
+hbase(main):04:0> describe 'users'
+----
++
+The coprocessor should be listed in the `TABLE_ATTRIBUTES`.
 
-When calling out to registered observers, the framework executes their callbacks methods in the sorted order of their priority.
-Ties are broken arbitrarily.
+==== Using the Java API (all HBase versions)
 
-=== Load from the HBase Shell
+The following Java code shows how to use the `setValue()` method of `HTableDescriptor`
+to load a coprocessor on the `users` table.
 
-You can load a coprocessor on a specific table via a table attribute.
-The following example will load the `FooRegionObserver` observer when table `t1` is read or re-read.
+[source,java]
+----
+TableName tableName = TableName.valueOf("users");
+String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
+Configuration conf = HBaseConfiguration.create();
+Connection connection = ConnectionFactory.createConnection(conf);
+Admin admin = connection.getAdmin();
+admin.disableTable(tableName);
+HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
+HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
+columnFamily1.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily1);
+HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
+columnFamily2.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily2);
+hTableDescriptor.setValue("COPROCESSOR$1", path + "|"
++ RegionObserverExample.class.getCanonicalName() + "|"
++ Coprocessor.PRIORITY_USER);
+admin.modifyTable(tableName, hTableDescriptor);
+admin.enableTable(tableName);
+----
 
-.Load a Coprocessor On a Table Using HBase Shell
-====
+==== Using the Java API (HBase 0.96+ only)
+
+In HBase 0.96 and newer, the `addCoprocessor()` method of `HTableDescriptor` provides
+an easier way to load a coprocessor dynamically.
+
+[source,java]
 ----
-hbase(main):005:0>  alter 't1', METHOD => 'table_att',
-  'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
-Updating all regions with the new schema...
-1/1 regions updated.
-Done.
-0 row(s) in 1.0730 seconds
-
-hbase(main):006:0> describe 't1'
-DESCRIPTION                                                        ENABLED
- {NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false
- nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B
- LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
-  => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>
- '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ
- E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO
- CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',
-  BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'
- , COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'
- , KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY
- => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
-1 row(s) in 0.0190 seconds
+TableName tableName = TableName.valueOf("users");
+Path path = new Path("hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar");
+Configuration conf = HBaseConfiguration.create();
+Connection connection = ConnectionFactory.createConnection(conf);
+Admin admin = connection.getAdmin();
+admin.disableTable(tableName);
+HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
+HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
+columnFamily1.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily1);
+HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
+columnFamily2.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily2);
+hTableDescriptor.addCoprocessor(RegionObserverExample.class.getCanonicalName(), path,
+Coprocessor.PRIORITY_USER, null);
+admin.modifyTable(tableName, hTableDescriptor);
+admin.enableTable(tableName);
 ----
-====
 
-The coprocessor framework will try to read the class information from the coprocessor table attribute value.
-The value contains four pieces of information which are separated by the `|` character.
+WARNING: There is no guarantee that the framework will load a given Coprocessor successfully.
+For example, the shell command neither guarantees a jar file exists at a particular location nor
+verifies whether the given class is actually contained in the jar file.
 
-* File path: The jar file containing the coprocessor implementation must be in a location where all region servers can read it.
-  You could copy the file onto the local disk on each region server, but it is recommended to store it in HDFS.
-* Class name: The full class name of the coprocessor.
-* Priority: An integer.
-  The framework will determine the execution sequence of all configured observers registered at the same hook using priorities.
-  This field can be left blank.
-  In that case the framework will assign a default priority value.
-* Arguments: This field is passed to the coprocessor implementation.
 
-.Unload a Coprocessor From a Table Using HBase Shell
-====
+=== Dynamic Unloading
+
+==== Using HBase Shell
+
+. Disable the table.
++
+[source]
+----
+hbase> disable 'users'
 ----
 
-hbase(main):007:0> alter 't1', METHOD => 'table_att_unset',
-hbase(main):008:0*   NAME => 'coprocessor$1'
-Updating all regions with the new schema...
-1/1 regions updated.
-Done.
-0 row(s) in 1.1130 seconds
-
-hbase(main):009:0> describe 't1'
-DESCRIPTION                                                        ENABLED
- {NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false
-  'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION
- S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214
- 7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN
- _MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true
- '}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
- 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
-  'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C
- ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO
- DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
-1 row(s) in 0.0180 seconds
+. Alter the table to remove the coprocessor.
++
+[source]
 ----
-====
+hbase> alter 'users', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
+----
+
+. Enable the table.
++
+[source]
+----
+hbase> enable 'users'
+----
+
+==== Using the Java API
+
+Reload the table definition without setting the value of the coprocessor either by
+using `setValue()` or `addCoprocessor()` methods. This will remove any coprocessor
+attached to the table.
+
+[source,java]
+----
+TableName tableName = TableName.valueOf("users");
+String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
+Configuration conf = HBaseConfiguration.create();
+Connection connection = ConnectionFactory.createConnection(conf);
+Admin admin = connection.getAdmin();
+admin.disableTable(tableName);
+HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
+HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
+columnFamily1.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily1);
+HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
+columnFamily2.setMaxVersions(3);
+hTableDescriptor.addFamily(columnFamily2);
+admin.modifyTable(tableName, hTableDescriptor);
+admin.enableTable(tableName);
+----
+
+In HBase 0.96 and newer, you can instead use the `removeCoprocessor()` method of the
+`HTableDescriptor` class.
+
+
+[[cp_example]]
+== Examples
+HBase ships examples for Observer Coprocessor in
+link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/ZooKeeperScanPolicyObserver.html[ZooKeeperScanPolicyObserver]
+and for Endpoint Coprocessor in
+link:http://hbase.apache.org/xref/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.html[RowCountEndpoint]
+
+A more detailed example is given below.
+
+These examples assume a table called `users`, which has two column families `personalDet`
+and `salaryDet`, containing personal and salary details. Below is the graphical representation
+of the `users` table.
+
+.Users Table
+[width="100%",cols="7",options="header,footer"]
+|====================
+| 3+|personalDet  3+|salaryDet
+|*rowkey* |*name* |*lastname* |*dob* |*gross* |*net* |*allowances*
+|admin |Admin |Admin |  3+|
+|cdickens |Charles |Dickens |02/07/1812 |10000 |8000 |2000
+|jverne |Jules |Verne |02/08/1828 |12000 |9000 |3000
+|====================
+
+
+=== Observer Example
+
+The following Observer coprocessor prevents the details of the user `admin` from being
+returned in a `Get` or `Scan` of the `users` table.
+
+. Write a class that extends the
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.html[BaseRegionObserver]
+class.
 
-WARNING: There is no guarantee that the framework will load a given coprocessor successfully.
-For example, the shell command neither guarantees a jar file exists at a particular location nor verifies whether the given class is actually contained in the jar file.
+. Override the `preGetOp()` method (the `preGet()` method is deprecated) to check
+whether the client has queried for the rowkey with value `admin`. If so, return an
+empty result. Otherwise, process the request as normal.
 
-== Check the Status of a Coprocessor
+. Put your code and dependencies in a JAR file.
 
-To check the status of a coprocessor after it has been configured, use the `status` HBase Shell command.
+. Place the JAR in HDFS where HBase can locate it.
 
+. Load the Coprocessor.
+
+. Write a simple program to test it.
+
+Following are the implementation of the above steps:
+
+
+[source,java]
+----
+public class RegionObserverExample extends BaseRegionObserver {
+
+    private static final byte[] ADMIN = Bytes.toBytes("admin");
+    private static final byte[] COLUMN_FAMILY = Bytes.toBytes("details");
+    private static final byte[] COLUMN = Bytes.toBytes("Admin_det");
+    private static final byte[] VALUE = Bytes.toBytes("You can't see Admin details");
+
+    @Override
+    public void preGetOp(final ObserverContext<RegionCoprocessorEnvironment> e, final Get get, final List<Cell> results)
+    throws IOException {
+
+        if (Bytes.equals(get.getRow(),ADMIN)) {
+            Cell c = CellUtil.createCell(get.getRow(),COLUMN_FAMILY, COLUMN,
+            System.currentTimeMillis(), (byte)4, VALUE);
+            results.add(c);
+            e.bypass();
+        }
+    }
+}
+----
+
+Overriding the `preGetOp()` will only work for `Get` operations. You also need to override
+the `preScannerOpen()` method to filter the `admin` row from scan results.
+
+[source,java]
+----
+@Override
+public RegionScanner preScannerOpen(final ObserverContext<RegionCoprocessorEnvironment> e, final Scan scan,
+final RegionScanner s) throws IOException {
+
+    Filter filter = new RowFilter(CompareOp.NOT_EQUAL, new BinaryComparator(ADMIN));
+    scan.setFilter(filter);
+    return s;
+}
 ----
 
-hbase(main):020:0> status 'detailed'
-version 0.92-tm-6
-0 regionsInTransition
-master coprocessors: []
-1 live servers
-    localhost:52761 1328082515520
-        requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
-        -ROOT-,,0
-            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
-storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
-        .META.,,1
-            numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
-storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
-        t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
-            numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
-storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
-totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN,
-coprocessors=[AggregateImplementation]
-0 dead servers
+This method works but there is a _side effect_. If the client has used a filter in
+its scan, that filter will be replaced by this filter. Instead, you can explicitly
+remove any `admin` results from the scan:
+
+[source,java]
+----
+@Override
+public boolean postScannerNext(final ObserverContext<RegionCoprocessorEnvironment> e, final InternalScanner s,
+final List<Result> results, final int limit, final boolean hasMore) throws IOException {
+	Result result = null;
+    Iterator<Result> iterator = results.iterator();
+    while (iterator.hasNext()) {
+    result = iterator.next();
+        if (Bytes.equals(result.getRow(), ROWKEY)) {
+            iterator.remove();
+            break;
+        }
+    }
+    return hasMore;
+}
 ----
 
+=== Endpoint Example
+
+Still using the `users` table, this example implements a coprocessor to calculate
+the sum of all employee salaries, using an endpoint coprocessor.
+
+. Create a '.proto' file defining your service.
++
+[source]
+----
+option java_package = "org.myname.hbase.coprocessor.autogenerated";
+option java_outer_classname = "Sum";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+option optimize_for = SPEED;
+message SumRequest {
+    required string family = 1;
+    required string column = 2;
+}
+
+message SumResponse {
+  required int64 sum = 1 [default = 0];
+}
+
+service SumService {
+  rpc getSum(SumRequest)
+    returns (SumResponse);
+}
+----
+
+. Execute the `protoc` command to generate the Java code from the above .proto' file.
++
+[source]
+----
+$ mkdir src
+$ protoc --java_out=src ./sum.proto
+----
++
+This will generate a class call `Sum.java`.
+
+. Write a class that extends the generated service class, implement the `Coprocessor`
+and `CoprocessorService` classes, and override the service method.
++
+WARNING: If you load a coprocessor from `hbase-site.xml` and then load the same coprocessor
+again using HBase Shell, it will be loaded a second time. The same class will
+exist twice, and the second instance will have a higher ID (and thus a lower priority).
+The effect is that the duplicate coprocessor is effectively ignored.
++
+[source, java]
+----
+public class SumEndPoint extends SumService implements Coprocessor, CoprocessorService {
+
+    private RegionCoprocessorEnvironment env;
+
+    @Override
+    public Service getService() {
+        return this;
+    }
+
+    @Override
+    public void start(CoprocessorEnvironment env) throws IOException {
+        if (env instanceof RegionCoprocessorEnvironment) {
+            this.env = (RegionCoprocessorEnvironment)env;
+        } else {
+            throw new CoprocessorException("Must be loaded on a table region!");
+        }
+    }
+
+    @Override
+    public void stop(CoprocessorEnvironment env) throws IOException {
+        // do mothing
+    }
+
+    @Override
+    public void getSum(RpcController controller, SumRequest request, RpcCallback done) {
+        Scan scan = new Scan();
+        scan.addFamily(Bytes.toBytes(request.getFamily()));
+        scan.addColumn(Bytes.toBytes(request.getFamily()), Bytes.toBytes(request.getColumn()));
+        SumResponse response = null;
+        InternalScanner scanner = null;
+        try {
+            scanner = env.getRegion().getScanner(scan);
+            List results = new ArrayList();
+            boolean hasMore = false;
+                        long sum = 0L;
+                do {
+                        hasMore = scanner.next(results);
+                        for (Cell cell : results) {
+                            sum = sum + Bytes.toLong(CellUtil.cloneValue(cell));
+                     }
+                        results.clear();
+                } while (hasMore);
+
+                response = SumResponse.newBuilder().setSum(sum).build();
+
+        } catch (IOException ioe) {
+            ResponseConverter.setControllerException(controller, ioe);
+        } finally {
+            if (scanner != null) {
+                try {
+                    scanner.close();
+                } catch (IOException ignored) {}
+            }
+        }
+        done.run(response);
+    }
+}
+----
++
+[source, java]
+----
+Configuration conf = HBaseConfiguration.create();
+// Use below code for HBase version 1.x.x or above.
+Connection connection = ConnectionFactory.createConnection(conf);
+TableName tableName = TableName.valueOf("users");
+Table table = connection.getTable(tableName);
+
+//Use below code HBase version 0.98.xx or below.
+//HConnection connection = HConnectionManager.createConnection(conf);
+//HTableInterface table = connection.getTable("users");
+
+final SumRequest request = SumRequest.newBuilder().setFamily("salaryDet").setColumn("gross")
+                            .build();
+try {
+Map<byte[], Long> results = table.CoprocessorService (SumService.class, null, null,
+new Batch.Call<SumService, Long>() {
+    @Override
+        public Long call(SumService aggregate) throws IOException {
+BlockingRpcCallback rpcCallback = new BlockingRpcCallback();
+            aggregate.getSum(null, request, rpcCallback);
+            SumResponse response = rpcCallback.get();
+            return response.hasSum() ? response.getSum() : 0L;
+        }
+    });
+    for (Long sum : results.values()) {
+        System.out.println("Sum = " + sum);
+    }
+} catch (ServiceException e) {
+e.printStackTrace();
+} catch (Throwable e) {
+    e.printStackTrace();
+}
+----
+
+. Load the Coprocessor.
+
+. Write a client code to call the Coprocessor.
+
+
+== Guidelines For Deploying A Coprocessor
+
+Bundling Coprocessors::
+  You can bundle all classes for a coprocessor into a
+  single JAR on the RegionServer's classpath, for easy deployment. Otherwise,
+  place all dependencies  on the RegionServer's classpath so that they can be
+  loaded during RegionServer start-up.  The classpath for a RegionServer is set
+  in the RegionServer's `hbase-env.sh` file.
+Automating Deployment::
+  You can use a tool such as Puppet, Chef, or
+  Ansible to ship the JAR for the coprocessor  to the required location on your
+  RegionServers' filesystems and restart each RegionServer,  to automate
+  coprocessor deployment. Details for such set-ups are out of scope of  this
+  document.
+Updating a Coprocessor::
+  Deploying a new version of a given coprocessor is not as simple as disabling it,
+  replacing the JAR, and re-enabling the coprocessor. This is because you cannot
+  reload a class in a JVM unless you delete all the current references to it.
+  Since the current JVM has reference to the existing coprocessor, you must restart
+  the JVM, by restarting the RegionServer, in order to replace it. This behavior
+  is not expected to change.
+Coprocessor Logging::
+  The Coprocessor framework does not provide an API for logging beyond standard Java
+  logging.
+Coprocessor Configuration::
+  If you do not want to load coprocessors from the HBase Shell, you can add their configuration
+  properties to `hbase-site.xml`. In <<load_coprocessor_in_shell>>, two arguments are
+  set: `arg1=1,arg2=2`. These could have been added to `hbase-site.xml` as follows:
+[source,xml]
+----
+<property>
+  <name>arg1</name>
+  <value>1</value>
+</property>
+<property>
+  <name>arg2</name>
+  <value>2</value>
+</property>
+----
+Then you can read the configuration using code like the following:
+[source,java]
+----
+Configuration conf = HBaseConfiguration.create();
+// Use below code for HBase version 1.x.x or above.
+Connection connection = ConnectionFactory.createConnection(conf);
+TableName tableName = TableName.valueOf("users");
+Table table = connection.getTable(tableName);
+
+//Use below code HBase version 0.98.xx or below.
+//HConnection connection = HConnectionManager.createConnection(conf);
+//HTableInterface table = connection.getTable("users");
+
+Get get = new Get(Bytes.toBytes("admin"));
+Result result = table.get(get);
+for (Cell c : result.rawCells()) {
+    System.out.println(Bytes.toString(CellUtil.cloneRow(c))
+        + "==> " + Bytes.toString(CellUtil.cloneFamily(c))
+        + "{" + Bytes.toString(CellUtil.cloneQualifier(c))
+        + ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}");
+}
+Scan scan = new Scan();
+ResultScanner scanner = table.getScanner(scan);
+for (Result res : scanner) {
+    for (Cell c : res.rawCells()) {
+        System.out.println(Bytes.toString(CellUtil.cloneRow(c))
+        + " ==> " + Bytes.toString(CellUtil.cloneFamily(c))
+        + " {" + Bytes.toString(CellUtil.cloneQualifier(c))
+        + ":" + Bytes.toLong(CellUtil.cloneValue(c))
+        + "}");
+    }
+}
+----
+
+
+
+
 == Monitor Time Spent in Coprocessors
 
-HBase 0.98.5 introduced the ability to monitor some statistics relating to the amount of time spent executing a given coprocessor.
-You can see these statistics via the HBase Metrics framework (see <<hbase_metrics>> or the Web UI for a given Region Server, via the _Coprocessor Metrics_ tab.
-These statistics are valuable for debugging and benchmarking the performance impact of a given coprocessor on your cluster.
+HBase 0.98.5 introduced the ability to monitor some statistics relating to the amount of time
+spent executing a given Coprocessor.
+You can see these statistics via the HBase Metrics framework (see <<hbase_metrics>> or the Web UI
+for a given Region Server, via the _Coprocessor Metrics_ tab.
+These statistics are valuable for debugging and benchmarking the performance impact of a given
+Coprocessor on your cluster.
 Tracked statistics include min, max, average, and 90th, 95th, and 99th percentile.
 All times are shown in milliseconds.
-The statistics are calculated over coprocessor execution samples recorded during the reporting interval, which is 10 seconds by default.
+The statistics are calculated over Coprocessor execution samples recorded during the reporting
+interval, which is 10 seconds by default.
 The metrics sampling rate as described in <<hbase_metrics>>.
 
 .Coprocessor Metrics UI

http://git-wip-us.apache.org/repos/asf/hbase/blob/6cb8a436/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc b/src/main/asciidoc/_chapters/datamodel.adoc
index b76adc8..30465fb 100644
--- a/src/main/asciidoc/_chapters/datamodel.adoc
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -93,7 +93,7 @@ The colon character (`:`) delimits the column family from the column family _qua
 |===
 |Row Key |Time Stamp  |ColumnFamily `contents` |ColumnFamily `anchor`|ColumnFamily `people`
 |"com.cnn.www" |t9    | |anchor:cnnsi.com = "CNN"   |
-|"com.cnn.www" |t8    | |anchor:my.look.ca = "CNN.com" |  
+|"com.cnn.www" |t8    | |anchor:my.look.ca = "CNN.com" |
 |"com.cnn.www" |t6  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t5  | contents:html = "<html>..."    | |
 |"com.cnn.www" |t3  | contents:html = "<html>..."    | |
@@ -171,7 +171,7 @@ For more information about the internals of how Apache HBase stores data, see <<
 A namespace is a logical grouping of tables analogous to a database in relation database systems.
 This abstraction lays the groundwork for upcoming multi-tenancy related features:
 
-* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (ie regions, tables) a namespace can consume.
+* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (i.e. regions, tables) a namespace can consume.
 * Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
 * Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation.
 
@@ -257,7 +257,7 @@ For example, the columns _courses:history_ and _courses:math_ are both members o
 The colon character (`:`) delimits the column family from the column family qualifier.
 The column family prefix must be composed of _printable_ characters.
 The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes.
-Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up an running.
+Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up and running.
 
 Physically, all column family members are stored together on the filesystem.
 Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.
@@ -279,7 +279,7 @@ Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hba
 
 === Put
 
-link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])[Table.batch] (non-writeBuffer).
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,%20java.lang.Object%5B%5D)[Table.batch] (non-writeBuffer).
 
 [[scan]]
 === Scans
@@ -542,6 +542,7 @@ Thus, while HBase can support not only a wide number of columns per row, but a h
 The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows.
 For more information about how HBase stores data internally, see <<keyvalue,keyvalue>>.
 
+[[joins]]
 == Joins
 
 Whether HBase supports joins is a common question on the dist-list, and there is a simple answer:  it doesn't, at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL).  As has been illustrated in this chapter, the read data model operations in HBase are Get and Scan.
@@ -552,7 +553,7 @@ hash-joins). So which is the best approach? It depends on what you are trying to
 
 == ACID
 
-See link:http://hbase.apache.org/acid-semantics.html[ACID Semantics].
+See link:/acid-semantics.html[ACID Semantics].
 Lars Hofhansl has also written a note on link:http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html[ACID in HBase].
 
 ifdef::backend-docbook[]