You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by nd...@apache.org on 2015/05/12 00:46:56 UTC
[08/18] hbase git commit: HBASE-13665 Fix docs and site building on branch-1

http://git-wip-us.apache.org/repos/asf/hbase/blob/33fe79cf/src/main/asciidoc/_chapters/orca.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/orca.adoc b/src/main/asciidoc/_chapters/orca.adoc
new file mode 100644
index 0000000..1816b1a
--- /dev/null
+++ b/src/main/asciidoc/_chapters/orca.adoc
@@ -0,0 +1,38 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[appendix]
+[[orca]]
+== Apache HBase Orca
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+.Apache HBase Orca
+image::jumping-orca_rotated_25percent.png[]
+
+link:https://issues.apache.org/jira/browse/HBASE-4920[An Orca is the Apache HBase mascot.] See NOTICES.txt.
+Our Orca logo we got here: http://www.vectorfree.com/jumping-orca It is licensed Creative Commons Attribution 3.0.
+See https://creativecommons.org/licenses/by/3.0/us/ We changed the logo by stripping the colored background, inverting it and then rotating it some.
+
+:numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/33fe79cf/src/main/asciidoc/_chapters/other_info.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/other_info.adoc b/src/main/asciidoc/_chapters/other_info.adoc
new file mode 100644
index 0000000..046b747
--- /dev/null
+++ b/src/main/asciidoc/_chapters/other_info.adoc
@@ -0,0 +1,80 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[appendix]
+[[other.info]]
+== Other Information About HBase
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+[[other.info.videos]]
+=== HBase Videos
+
+.Introduction to HBase 
+* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). 
+* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010).         
+link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011). 
+
+link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010). 
+
+[[other.info.pres]]
+=== HBase Presentations (Slides)
+
+link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011). 
+
+link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011). 
+
+link:http://www.slideshare.net/cloudera/hw09-practical-h-base-getting-the-most-from-your-h-base-install[Getting The Most From Your HBase Install] by Ryan Rawson, Jonathan Gray (Hadoop World 2009). 
+
+[[other.info.papers]]
+=== HBase Papers
+
+link:http://research.google.com/archive/bigtable.html[BigTable] by Google (2006). 
+
+link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS Locality] by Lars George (2010). 
+
+link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases] by Ian Varley (2009). 
+
+[[other.info.sites]]
+=== HBase Sites
+
+link:http://www.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information. 
+
+* link:http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems.        
+
+link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations. 
+
+link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone. 
+
+[[other.info.books]]
+=== HBase Books
+
+link:http://shop.oreilly.com/product/0636920014348.do[HBase:  The Definitive Guide] by Lars George. 
+
+[[other.info.books.hadoop]]
+=== Hadoop Books
+
+link:http://shop.oreilly.com/product/9780596521981.do[Hadoop:  The Definitive Guide] by Tom White. 
+
+:numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/33fe79cf/src/main/asciidoc/_chapters/performance.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/performance.adoc b/src/main/asciidoc/_chapters/performance.adoc
new file mode 100644
index 0000000..2155d52
--- /dev/null
+++ b/src/main/asciidoc/_chapters/performance.adoc
@@ -0,0 +1,890 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+
+[[performance]]
+= Apache HBase Performance Tuning
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+[[perf.os]]
+== Operating System
+
+[[perf.os.ram]]
+=== Memory
+
+RAM, RAM, RAM.
+Don't starve HBase.
+
+[[perf.os.64]]
+=== 64-bit
+
+Use a 64-bit platform (and 64-bit JVM).
+
+[[perf.os.swap]]
+=== Swapping
+
+Watch out for swapping.
+Set `swappiness` to 0.
+
+[[perf.network]]
+== Network
+
+Perhaps the most important factor in avoiding network issues degrading Hadoop and HBase performance is the switching hardware that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).
+
+Important items to consider:
+
+* Switching capacity of the device
+* Number of systems connected
+* Uplink capacity
+
+[[perf.network.1switch]]
+=== Single Switch
+
+The single most important factor in this configuration is that the switching capacity of the hardware is capable of handling the traffic which can be generated by all systems connected to the switch.
+Some lower priced commodity hardware can have a slower switching capacity than could be utilized by a full switch.
+
+[[perf.network.2switch]]
+=== Multiple Switches
+
+Multiple switches are a potential pitfall in the architecture.
+The most common configuration of lower priced hardware is a simple 1Gbps uplink from one switch to another.
+This often overlooked pinch point can easily become a bottleneck for cluster communication.
+Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated.
+
+Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
+
+* Use appropriate hardware for the scale of the cluster which you're attempting to build.
+* Use larger single switch configurations i.e.
+  single 48 port as opposed to 2x 24 port
+* Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.
+
+[[perf.network.multirack]]
+=== Multiple Racks
+
+Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:
+
+* Poor switch capacity performance
+* Insufficient uplink to another rack
+
+If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks.
+The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
+The downside of this method however, is in the overhead of ports that could potentially be used.
+An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
+
+Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to save your ports for machines as opposed to uplinks.
+
+[[perf.network.ints]]
+=== Network Interfaces
+
+Are all the network interfaces functioning correctly? Are you sure? See the Troubleshooting Case Study in <<casestudies.slownode>>.
+
+[[perf.network.call_me_maybe]]
+=== Network Consistency and Partition Tolerance
+The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics: 
+- *C*onsistency -- all nodes see the same data. 
+- *A*vailability -- every request receives a response about whether it succeeded or failed.
+- *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others.
+
+HBase favors consistency and partition tolerance, where a decision has to be made. Coda Hale explains why partition tolerance is so important, in http://codahale.com/you-cant-sacrifice-partition-tolerance/. 
+
+Robert Yokota used an automated testing framework called link:https://aphyr.com/tags/jepsen[Jepson] to test HBase's partition tolerance in the face of network partitions, using techniques modeled after Aphyr's link:https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions[Call Me Maybe] series. The results, available as a link:http://eng.yammer.com/call-me-maybe-hbase/[blog post] and an link:http://eng.yammer.com/call-me-maybe-hbase-addendum/[addendum], show that HBase performs correctly.
+
+[[jvm]]
+== Java
+
+[[gc]]
+=== The Garbage Collector and Apache HBase
+
+[[gcpause]]
+==== Long GC pauses
+
+In his presentation, link:http://www.slideshare.net/cloudera/hbase-hug-presentation[Avoiding Full GCs with MemStore-Local Allocation Buffers], Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading; CMS failure modes and old generation heap fragmentation brought.
+
+To address the first, start the CMS earlier than default by adding `-XX:CMSInitiatingOccupancyFraction` and setting it down from defaults.
+Start at 60 or 70 percent (The lower you bring down the threshold, the more GCing is done, the more CPU used). To address the second fragmentation issue, Todd added an experimental facility,
+(MSLAB), that must be explicitly enabled in Apache HBase 0.90.x (It's defaulted to be _on_ in Apache 0.92.x HBase). Set `hbase.hregion.memstore.mslab.enabled` to true in your `Configuration`.
+See the cited slides for background and detail.
+The latest JVMs do better regards fragmentation so make sure you are running a recent release.
+Read down in the message, link:http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html[Identifying concurrent mode failures caused by fragmentation].
+Be aware that when enabled, each MemStore instance will occupy at least an MSLAB instance of memory.
+If you have thousands of regions or lots of regions each with many column families, this allocation of MSLAB may be responsible for a good portion of your heap allocation and in an extreme case cause you to OOME.
+Disable MSLAB in this case, or lower the amount of memory it uses or float less regions per server.
+
+If you have a write-heavy workload, check out link:https://issues.apache.org/jira/browse/HBASE-8163[HBASE-8163 MemStoreChunkPool: An improvement for JAVA GC when using MSLAB].
+It describes configurations to lower the amount of young GC during write-heavy loadings.
+If you do not have HBASE-8163 installed, and you are trying to improve your young GC times, one trick to consider -- courtesy of our Liang Xie -- is to set the GC config `-XX:PretenureSizeThreshold` in _hbase-env.sh_ to be just smaller than the size of `hbase.hregion.memstore.mslab.chunksize` so MSLAB allocations happen in the tenured space directly rather than first in the young gen.
+You'd do this because these MSLAB allocations are going to likely make it to the old gen anyways and rather than pay the price of a copies between s0 and s1 in eden space followed by the copy up from young to old gen after the MSLABs have achieved sufficient tenure, save a bit of YGC churn and allocate in the old gen directly.
+
+For more information about GC logs, see <<trouble.log.gc>>.
+
+Consider also enabling the off-heap Block Cache.
+This has been shown to mitigate GC pause times.
+See <<block.cache>>
+
+[[perf.configurations]]
+== HBase Configurations
+
+See <<recommended_configurations>>.
+
+[[perf.compactions.and.splits]]
+=== Managing Compactions
+
+For larger systems, managing link:[compactions and splits] may be something you want to consider.
+
+[[perf.handlers]]
+=== `hbase.regionserver.handler.count`
+
+See <<hbase.regionserver.handler.count>>.
+
+[[perf.hfile.block.cache.size]]
+=== `hfile.block.cache.size`
+
+See <<hfile.block.cache.size>>.
+A memory setting for the RegionServer process.
+
+[[blockcache.prefetch]]
+=== Prefetch Option for Blockcache
+
+link:https://issues.apache.org/jira/browse/HBASE-9857[HBASE-9857] adds a new option to prefetch HFile contents when opening the BlockCache, if a Column family or RegionServer property is set.
+This option is available for HBase 0.98.3 and later.
+The purpose is to warm the BlockCache as rapidly as possible after the cache is opened, using in-memory table data, and not counting the prefetching as cache misses.
+This is great for fast reads, but is not a good idea if the data to be preloaded will not fit into the BlockCache.
+It is useful for tuning the IO impact of prefetching versus the time before all data blocks are in cache.
+
+To enable prefetching on a given column family, you can use HBase Shell or use the API.
+
+.Enable Prefetch Using HBase Shell
+====
+----
+hbase> create 'MyTable', { NAME => 'myCF', PREFETCH_BLOCKS_ON_OPEN => 'true' }
+----
+====
+
+.Enable Prefetch Using the API
+====
+[source,java]
+----
+
+// ...
+HTableDescriptor tableDesc = new HTableDescriptor("myTable");
+HColumnDescriptor cfDesc = new HColumnDescriptor("myCF");
+cfDesc.setPrefetchBlocksOnOpen(true);
+tableDesc.addFamily(cfDesc);
+// ...
+----
+====
+
+See the API documentation for link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html[CacheConfig].
+
+[[perf.rs.memstore.size]]
+=== `hbase.regionserver.global.memstore.size`
+
+See <<hbase.regionserver.global.memstore.size>>.
+This memory setting is often adjusted for the RegionServer process depending on needs.
+
+[[perf.rs.memstore.size.lower.limit]]
+=== `hbase.regionserver.global.memstore.size.lower.limit`
+
+See <<hbase.regionserver.global.memstore.size.lower.limit>>.
+This memory setting is often adjusted for the RegionServer process depending on needs.
+
+[[perf.hstore.blockingstorefiles]]
+=== `hbase.hstore.blockingStoreFiles`
+
+See <<hbase.hstore.blockingstorefiles>>.
+If there is blocking in the RegionServer logs, increasing this can help.
+
+[[perf.hregion.memstore.block.multiplier]]
+=== `hbase.hregion.memstore.block.multiplier`
+
+See <<hbase.hregion.memstore.block.multiplier>>.
+If there is enough RAM, increasing this can help.
+
+[[hbase.regionserver.checksum.verify.performance]]
+=== `hbase.regionserver.checksum.verify`
+
+Have HBase write the checksum into the datablock and save having to do the checksum seek whenever you read.
+
+See <<hbase.regionserver.checksum.verify>>, <<hbase.hstore.bytes.per.checksum>> and <<hbase.hstore.checksum.algorithm>>. For more information see the release note on link:https://issues.apache.org/jira/browse/HBASE-5074[HBASE-5074 support checksums in HBase block cache].
+
+=== Tuning `callQueue` Options
+
+link:https://issues.apache.org/jira/browse/HBASE-11355[HBASE-11355] introduces several callQueue tuning mechanisms which can increase performance.
+See the JIRA for some benchmarking information.
+
+To increase the number of callqueues, set `hbase.ipc.server.num.callqueue` to a value greater than `1`.
+To split the callqueue into separate read and write queues, set `hbase.ipc.server.callqueue.read.ratio` to a value between `0` and `1`.
+This factor weights the queues toward writes (if below .5) or reads (if above .5). Another way to say this is that the factor determines what percentage of the split queues are used for reads.
+The following examples illustrate some of the possibilities.
+Note that you always have at least one write queue, no matter what setting you use.
+
+* The default value of `0` does not split the queue.
+* A value of `.3` uses 30% of the queues for reading and 60% for writing.
+  Given a value of `10` for `hbase.ipc.server.num.callqueue`, 3 queues would be used for reads and 7 for writes.
+* A value of `.5` uses the same number of read queues and write queues.
+  Given a value of `10` for `hbase.ipc.server.num.callqueue`, 5 queues would be used for reads and 5 for writes.
+* A value of `.6` uses 60% of the queues for reading and 30% for reading.
+  Given a value of `10` for `hbase.ipc.server.num.callqueue`, 7 queues would be used for reads and 3 for writes.
+* A value of `1.0` uses one queue to process write requests, and all other queues process read requests.
+  A value higher than `1.0` has the same effect as a value of `1.0`.
+  Given a value of `10` for `hbase.ipc.server.num.callqueue`, 9 queues would be used for reads and 1 for writes.
+
+You can also split the read queues so that separate queues are used for short reads (from Get operations) and long reads (from Scan operations), by setting the `hbase.ipc.server.callqueue.scan.ratio` option.
+This option is a factor between 0 and 1, which determine the ratio of read queues used for Gets and Scans.
+More queues are used for Gets if the value is below `.5` and more are used for scans if the value is above `.5`.
+No matter what setting you use, at least one read queue is used for Get operations.
+
+* A value of `0` does not split the read queue.
+* A value of `.3` uses 60% of the read queues for Gets and 30% for Scans.
+  Given a value of `20` for `hbase.ipc.server.num.callqueue` and a value of `.5` for `hbase.ipc.server.callqueue.read.ratio`, 10 queues would be used for reads, out of those 10, 7 would be used for Gets and 3 for Scans.
+* A value of `.5` uses half the read queues for Gets and half for Scans.
+  Given a value of `20` for `hbase.ipc.server.num.callqueue` and a value of `.5` for `hbase.ipc.server.callqueue.read.ratio`, 10 queues would be used for reads, out of those 10, 5 would be used for Gets and 5 for Scans.
+* A value of `.6` uses 30% of the read queues for Gets and 60% for Scans.
+  Given a value of `20` for `hbase.ipc.server.num.callqueue` and a value of `.5` for `hbase.ipc.server.callqueue.read.ratio`, 10 queues would be used for reads, out of those 10, 3 would be used for Gets and 7 for Scans.
+* A value of `1.0` uses all but one of the read queues for Scans.
+  Given a value of `20` for `hbase.ipc.server.num.callqueue` and a value of`.5` for `hbase.ipc.server.callqueue.read.ratio`, 10 queues would be used for reads, out of those 10, 1 would be used for Gets and 9 for Scans.
+
+You can use the new option `hbase.ipc.server.callqueue.handler.factor` to programmatically tune the number of queues:
+
+* A value of `0` uses a single shared queue between all the handlers.
+* A value of `1` uses a separate queue for each handler.
+* A value between `0` and `1` tunes the number of queues against the number of handlers.
+  For instance, a value of `.5` shares one queue between each two handlers.
++
+Having more queues, such as in a situation where you have one queue per handler, reduces contention when adding a task to a queue or selecting it from a queue.
+The trade-off is that if you have some queues with long-running tasks, a handler may end up waiting to execute from that queue rather than processing another queue which has waiting tasks.
+
+
+For these values to take effect on a given RegionServer, the RegionServer must be restarted.
+These parameters are intended for testing purposes and should be used carefully.
+
+[[perf.zookeeper]]
+== ZooKeeper
+
+See <<zookeeper>> for information on configuring ZooKeeper, and see the part about having a dedicated disk.
+
+[[perf.schema]]
+== Schema Design
+
+[[perf.number.of.cfs]]
+=== Number of Column Families
+
+See <<number.of.cfs>>.
+
+[[perf.schema.keys]]
+=== Key and Attribute Lengths
+
+See <<keysize>>.
+See also <<perf.compression.however>> for compression caveats.
+
+[[schema.regionsize]]
+=== Table RegionSize
+
+The regionsize can be set on a per-table basis via `setFileSize` on link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize.
+
+See <<ops.capacity.regions>> for more information.
+
+[[schema.bloom]]
+=== Bloom Filters
+
+A Bloom filter, named for its creator, Burton Howard Bloom, is a data structure which is designed to predict whether a given element is a member of a set of data.
+A positive result from a Bloom filter is not always accurate, but a negative result is guaranteed to be accurate.
+Bloom filters are designed to be "accurate enough" for sets of data which are so large that conventional hashing mechanisms would be impractical.
+For more information about Bloom filters in general, refer to http://en.wikipedia.org/wiki/Bloom_filter.
+
+In terms of HBase, Bloom filters provide a lightweight in-memory structure to reduce the number of disk reads for a given Get operation (Bloom filters do not work with Scans) to only the StoreFiles likely to contain the desired Row.
+The potential performance gain increases with the number of parallel reads.
+
+The Bloom filters themselves are stored in the metadata of each HFile and never need to be updated.
+When an HFile is opened because a region is deployed to a RegionServer, the Bloom filter is loaded into memory.
+
+HBase includes some tuning mechanisms for folding the Bloom filter to reduce the size and keep the false positive rate within a desired range.
+
+Bloom filters were introduced in link:https://issues.apache.org/jira/browse/HBASE-1200[HBASE-1200].
+Since HBase 0.96, row-based Bloom filters are enabled by default.
+(link:https://issues.apache.org/jira/browse/HBASE-8450[HBASE-])
+
+For more information on Bloom filters in relation to HBase, see <<blooms>> for more information, or the following Quora discussion: link:http://www.quora.com/How-are-bloom-filters-used-in-HBase[How are bloom filters used in HBase?].
+
+[[bloom.filters.when]]
+==== When To Use Bloom Filters
+
+Since HBase 0.96, row-based Bloom filters are enabled by default.
+You may choose to disable them or to change some tables to use row+column Bloom filters, depending on the characteristics of your data and how it is loaded into HBase.
+
+To determine whether Bloom filters could have a positive impact, check the value of `blockCacheHitRatio` in the RegionServer metrics.
+If Bloom filters are enabled, the value of `blockCacheHitRatio` should increase, because the Bloom filter is filtering out blocks that are definitely not needed.
+
+You can choose to enable Bloom filters for a row or for a row+column combination.
+If you generally scan entire rows, the row+column combination will not provide any benefit.
+A row-based Bloom filter can operate on a row+column Get, but not the other way around.
+However, if you have a large number of column-level Puts, such that a row may be present in every StoreFile, a row-based filter will always return a positive result and provide no benefit.
+Unless you have one column per row, row+column Bloom filters require more space, in order to store more keys.
+Bloom filters work best when the size of each data entry is at least a few kilobytes in size.
+
+Overhead will be reduced when your data is stored in a few larger StoreFiles, to avoid extra disk IO during low-level scans to find a specific row.
+
+Bloom filters need to be rebuilt upon deletion, so may not be appropriate in environments with a large number of deletions.
+
+==== Enabling Bloom Filters
+
+Bloom filters are enabled on a Column Family.
+You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API.
+Valid values are `NONE` (the default), `ROW`, or `ROWCOL`.
+See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`.
+See also the API documentation for link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+
+The following example creates a table and enables a ROWCOL Bloom filter on the `colfam1` column family.
+
+----
+
+hbase> create 'mytable',{NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}
+----
+
+==== Configuring Server-Wide Behavior of Bloom Filters
+
+You can configure the following settings in the _hbase-site.xml_.
+
+[cols="1,1,1", options="header"]
+|===
+| Parameter
+| Default
+| Description
+
+| io.hfile.bloom.enabled
+| yes
+| Set to no to kill bloom filters server-wide if something goes wrong
+
+| io.hfile.bloom.error.rate
+| .01
+| The average false positive rate for bloom filters. Folding is used to
+                  maintain the false positive rate. Expressed as a decimal representation of a
+                  percentage.
+
+| io.hfile.bloom.max.fold
+| 7
+| The guaranteed maximum fold rate. Changing this setting should not be
+                  necessary and is not recommended.
+
+| io.storefile.bloom.max.keys
+| 128000000
+| For default (single-block) Bloom filters, this specifies the maximum number of keys.
+
+| io.storefile.delete.family.bloom.enabled
+| true
+| Master switch to enable Delete Family Bloom filters and store them in the StoreFile.
+
+| io.storefile.bloom.block.size
+| 65536
+| Target Bloom block size. Bloom filter blocks of approximately this size
+                  are interleaved with data blocks.
+
+| hfile.block.bloom.cacheonwrite
+| false
+| Enables cache-on-write for inline blocks of a compound Bloom filter.
+|===
+
+[[schema.cf.blocksize]]
+=== ColumnFamily BlockSize
+
+The blocksize can be configured for each ColumnFamily in a table, and defaults to 64k.
+Larger cell values require larger blocksizes.
+There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
+
+See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information.
+
+[[cf.in.memory]]
+=== In-Memory ColumnFamilies
+
+ColumnFamilies can optionally be defined as in-memory.
+Data is still persisted to disk, just like any other ColumnFamily.
+In-memory blocks have the highest priority in the <<block.cache>>, but it is not a guarantee that the entire table will be in memory.
+
+See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
+
+[[perf.compression]]
+=== Compression
+
+Production systems should use compression with their ColumnFamily definitions.
+See <<compression>> for more information.
+
+[[perf.compression.however]]
+==== However...
+
+Compression deflates data _on disk_.
+When it's in-memory (e.g., in the MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated.
+So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
+
+See <<keysize>> on for schema design tips, and <<keyvalue>> for more information on HBase stores data internally.
+
+[[perf.general]]
+== HBase General Patterns
+
+[[perf.general.constants]]
+=== Constants
+
+When people get started with HBase they have a tendency to write code that looks like this:
+
+[source,java]
+----
+Get get = new Get(rowkey);
+Result r = table.get(get);
+byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value
+----
+
+But especially when inside loops (and MapReduce jobs), converting the columnFamily and column-names to byte-arrays repeatedly is surprisingly expensive.
+It's better to use constants for the byte-arrays, like this:
+
+[source,java]
+----
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(rowkey);
+Result r = table.get(get);
+byte[] b = r.getValue(CF, ATTR);  // returns current version of value
+----
+
+[[perf.writing]]
+== Writing to HBase
+
+[[perf.batch.loading]]
+=== Batch Loading
+
+Use the bulk load tool if you can.
+See <<arch.bulk.load>>.
+Otherwise, pay attention to the below.
+
+[[precreate.regions]]
+===  Table Creation: Pre-Creating Regions
+
+Tables in HBase are initially created with one region by default.
+For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster.
+A useful pattern to speed up the bulk import process is to pre-create empty regions.
+Be somewhat conservative in this, because too-many regions can actually degrade performance.
+
+There are two different approaches to pre-creating splits.
+The first approach is to rely on the default `Admin` strategy (which is implemented in `Bytes.split`)...
+
+[source,java]
+----
+
+byte[] startKey = ...;      // your lowest key
+byte[] endKey = ...;        // your highest key
+int numberOfRegions = ...;  // # of regions to create
+admin.createTable(table, startKey, endKey, numberOfRegions);
+----
+
+And the other approach is to define the splits yourself...
+
+[source,java]
+----
+byte[][] splits = ...;   // create your own splits
+admin.createTable(table, splits);
+----
+
+See <<rowkey.regionsplits>> for issues related to understanding your keyspace and pre-creating regions.
+See <<manual_region_splitting_decisions,manual region splitting decisions>>  for discussion on manually pre-splitting regions.
+
+[[def.log.flush]]
+===  Table Creation: Deferred Log Flush
+
+The default behavior for Puts using the Write Ahead Log (WAL) is that `WAL` edits will be written immediately.
+If deferred log flush is used, WAL edits are kept in memory until the flush period.
+The benefit is aggregated and asynchronous `WAL`- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost.
+This is safer, however, than not using WAL at all with Puts.
+
+Deferred log flush can be configured on tables via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor].
+The default value of `hbase.regionserver.optionallogflushinterval` is 1000ms.
+
+[[perf.hbase.client.autoflush]]
+=== HBase Client: AutoFlush
+
+When performing a lot of Puts, make sure that setAutoFlush is set to false on your link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instance.
+Otherwise, the Puts will be sent one at a time to the RegionServer.
+Puts added via `table.add(Put)` and `table.add( <List> Put)` wind up in the same write buffer.
+If `autoFlush = false`, these messages are not sent until the write-buffer is filled.
+To explicitly flush the messages, call `flushCommits`.
+Calling `close` on the `Table` instance will invoke `flushCommits`.
+
+[[perf.hbase.client.putwal]]
+=== HBase Client: Turn off WAL on Puts
+
+A frequent request is to disable the WAL to increase performance of Puts.
+This is only appropriate for bulk loads, as it puts your data at risk by removing the protection of the WAL in the event of a region server crash.
+Bulk loads can be re-run in the event of a crash, with little risk of data loss.
+
+WARNING: If you disable the WAL for anything other than bulk loads, your data is at risk.
+
+In general, it is best to use WAL for Puts, and where loading throughput is a concern to use bulk loading techniques instead.
+For normal Puts, you are not likely to see a performance improvement which would outweigh the risk.
+To disable the WAL, see <<wal.disable>>.
+
+[[perf.hbase.client.regiongroup]]
+=== HBase Client: Group Puts by RegionServer
+
+In addition to using the writeBuffer, grouping `Put`s by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
+There is a utility `HTableUtil` currently on TRUNK that does this, but you can either copy that or implement your own version for those still on 0.90.x or earlier.
+
+[[perf.hbase.write.mr.reducer]]
+=== MapReduce: Skip The Reducer
+
+When writing a lot of data to an HBase table from a MR job (e.g., with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step.
+When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
+It's far more efficient to just write directly to HBase.
+
+For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case.
+
+[[perf.one.region]]
+=== Anti-Pattern: One Hot Region
+
+If all your data is being written to one region at a time, then re-read the section on processing timeseries data.
+
+Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.
+There are a variety of reasons that regions may appear "well split" but won't work with your data.
+As the HBase client communicates directly with the RegionServers, this can be obtained via link:hhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte[])[Table.getRegionLocation].
+
+See <<precreate.regions>>, as well as <<perf.configurations>>
+
+[[perf.reading]]
+== Reading from HBase
+
+The mailing list can help if you are having performance issues.
+For example, here is a good general thread on what to look at addressing read-time issues: link:http://search-hadoop.com/m/qOo2yyHtCC1[HBase Random Read latency > 100ms]
+
+[[perf.hbase.client.caching]]
+=== Scan Caching
+
+If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed.
+Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed.
+There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better.
+
+[[perf.hbase.client.caching.mr]]
+==== Scan Caching in MapReduce Jobs
+
+Scan settings in MapReduce jobs deserve special attention.
+Timeouts can result (e.g., UnknownScannerException) in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the next set of data.
+This problem can occur because there is non-trivial processing occurring per row.
+If you process rows quickly, set caching higher.
+If you process rows more slowly (e.g., lots of transformations per row, writes), then set caching lower.
+
+Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the processing that is often performed in MapReduce jobs tends to exacerbate this issue.
+
+[[perf.hbase.client.selection]]
+=== Scan Attribute Selection
+
+Whenever a Scan is used to process large numbers of rows (and especially when used as a MapReduce source), be aware of which attributes are selected.
+If `scan.addFamily` is called then _all_ of the attributes in the specified ColumnFamily will be returned to the client.
+If only a small number of the available attributes are to be processed, then only those attributes should be specified in the input scan because attribute over-selection is a non-trivial performance penalty over large datasets.
+
+[[perf.hbase.client.seek]]
+=== Avoid scan seeks
+
+When columns are selected explicitly with `scan.addColumn`, HBase will schedule seek operations to seek between the selected columns.
+When rows have few columns and each column has only a few versions this can be inefficient.
+A seek operation is generally slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes.
+
+In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set the on Scan object.
+The following code instructs the RegionServer to attempt two iterations of next before a seek is scheduled:
+
+[source,java]
+----
+Scan scan = new Scan();
+scan.addColumn(...);
+scan.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
+table.getScanner(scan);
+----
+
+[[perf.hbase.mr.input]]
+=== MapReduce - Input Splits
+
+For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to have the same Input Split (i.e., the RegionServer serving the data), see the Troubleshooting Case Study in <<casestudies.slownode>>.
+
+[[perf.hbase.client.scannerclose]]
+=== Close ResultScanners
+
+This isn't so much about improving performance but rather _avoiding_ performance problems.
+If you forget to close link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers.
+Always have ResultScanner processing enclosed in try/catch blocks.
+
+[source,java]
+----
+Scan scan = new Scan();
+// set attrs...
+ResultScanner rs = table.getScanner(scan);
+try {
+  for (Result r = rs.next(); r != null; r = rs.next()) {
+  // process result...
+} finally {
+  rs.close();  // always close the ResultScanner!
+}
+table.close();
+----
+
+[[perf.hbase.client.blockcache]]
+=== Block Cache
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method.
+For input Scans to MapReduce jobs, this should be `false`.
+For frequently accessed rows, it is advisable to use the block cache.
+
+Cache more data by moving your Block Cache off-heap.
+See <<offheap.blockcache>>
+
+[[perf.hbase.client.rowkeyonly]]
+=== Optimal Loading of Row Keys
+
+When performing a table link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`.
+The filter list should include both a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter].
+Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row.
+
+[[perf.hbase.read.dist]]
+=== Concurrency: Monitor Data Spread
+
+When performing a high number of concurrent reads, monitor the data spread of the target tables.
+If the target table(s) have too few regions then the reads could likely be served from too few nodes.
+
+See <<precreate.regions>>, as well as <<perf.configurations>>
+
+[[blooms]]
+=== Bloom Filters
+
+Enabling Bloom Filters can save your having to go to disk and can help improve read latencies.
+
+link:http://en.wikipedia.org/wiki/Bloom_filter[Bloom filters] were developed over in link:https://issues.apache.org/jira/browse/HBASE-1200[HBase-1200 Add bloomfilters].
+For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the _Development Process_ section of the document link:https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf[BloomFilters in HBase] attached to link:https://issues.apache.org/jira/browse/HBASE-1200[HBASE-1200].
+The bloom filters described here are actually version two of blooms in HBase.
+In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the link:http://www.one-lab.org[European Commission One-Lab Project 034819].
+The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+Version 1 of HBase blooms never worked that well.
+Version 2 is a rewrite from scratch though again it starts with the one-lab work.
+
+See also <<schema.bloom>>.
+
+[[bloom_footprint]]
+==== Bloom StoreFile footprint
+
+Bloom filters add an entry to the `StoreFile` general `FileInfo` data structure and then two extra entries to the `StoreFile` metadata section.
+
+===== BloomFilter in the `StoreFile``FileInfo` data structure
+
+`FileInfo` has a `BLOOM_FILTER_TYPE` entry which is set to `NONE`, `ROW` or `ROWCOL.`
+
+===== BloomFilter entries in `StoreFile` metadata
+
+`BLOOM_FILTER_META` holds Bloom Size, Hash Function used, etc.
+It's small in size and is cached on `StoreFile.Reader` load
+
+`BLOOM_FILTER_DATA` is the actual bloomfilter data.
+Obtained on-demand.
+Stored in the LRU cache, if it is enabled (It's enabled by default).
+
+[[config.bloom]]
+==== Bloom Filter Configuration
+
+===== `io.hfile.bloom.enabled` global kill switch
+
+`io.hfile.bloom.enabled` in `Configuration` serves as the kill switch in case something goes wrong.
+Default = `true`.
+
+===== `io.hfile.bloom.error.rate`
+
+`io.hfile.bloom.error.rate` = average false positive rate.
+Default = 1%. Decrease rate by ½ (e.g.
+to .5%) == +1 bit per bloom entry.
+
+===== `io.hfile.bloom.max.fold`
+
+`io.hfile.bloom.max.fold` = guaranteed minimum fold rate.
+Most people should leave this alone.
+Default = 7, or can collapse to at least 1/128th of original size.
+See the _Development Process_ section of the document link:https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf[BloomFilters in HBase] for more on what this option means.
+
+=== Hedged Reads
+
+Hedged reads are a feature of HDFS, introduced in link:https://issues.apache.org/jira/browse/HDFS-5776[HDFS-5776].
+Normally, a single thread is spawned for each read request.
+However, if hedged reads are enabled, the client waits some configurable amount of time, and if the read does not return, the client spawns a second read request, against a different block replica of the same data.
+Whichever read returns first is used, and the other read request is discarded.
+Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection.
+
+Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment.
+
+.Configuration for Hedged Reads
+* `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads.
+  If this is set to 0 (the default), hedged reads are disabled.
+* `dfs.client.hedged.read.threshold.millis` - the number of milliseconds to wait before spawning a second read thread.
+
+.Hedged Reads Configuration Example
+====
+[source,xml]
+----
+<property>
+  <name>dfs.client.hedged.read.threadpool.size</name>
+  <value>20</value>  <!-- 20 threads -->
+</property>
+<property>
+  <name>dfs.client.hedged.read.threshold.millis</name>
+  <value>10</value>  <!-- 10 milliseconds -->
+</property>
+----
+====
+
+Use the following metrics to tune the settings for hedged reads on your cluster.
+See <<hbase_metrics>>  for more information.
+
+.Metrics for Hedged Reads
+* hedgedReadOps - the number of times hedged read threads have been triggered.
+  This could indicate that read requests are often slow, or that hedged reads are triggered too quickly.
+* hedgeReadOpsWin - the number of times the hedged read thread was faster than the original thread.
+  This could indicate that a given RegionServer is having trouble servicing requests.
+
+[[perf.deleting]]
+== Deleting from HBase
+
+[[perf.deleting.queue]]
+=== Using HBase Tables as Queues
+
+HBase tables are sometimes used as queues.
+In this case, special care must be taken to regularly perform major compactions on tables used in this manner.
+As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads.
+Tombstones only get cleaned up with major compactions.
+
+See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact%28java.lang.String%29[Admin.majorCompact].
+
+[[perf.deleting.rpc]]
+=== Delete RPC Behavior
+
+Be aware that `Table.delete(Delete)` doesn't use the writeBuffer.
+It will execute an RegionServer RPC with each invocation.
+For a large number of deletes, consider `Table.delete(List)`.
+
+See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29
+
+[[perf.hdfs]]
+== HDFS
+
+Because HBase runs on <<arch.hdfs>> it is important to understand how it works and how it affects HBase.
+
+[[perf.hdfs.curr]]
+=== Current Issues With Low-Latency Reads
+
+The original use-case for HDFS was batch processing.
+As such, there low-latency reads were historically not a priority.
+With the increased adoption of Apache HBase this is changing, and several improvements are already in development.
+See the link:https://issues.apache.org/jira/browse/HDFS-1599[Umbrella Jira Ticket for HDFS Improvements for HBase].
+
+[[perf.hdfs.configs.localread]]
+=== Leveraging local data
+
+Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via link:https://issues.apache.org/jira/browse/HDFS-2246[HDFS-2246], it is possible for the DFSClient to take a "short circuit" and read directly from the disk instead of going through the DataNode when the data is local.
+What this means for HBase is that the RegionServers can read directly off their machine's disks instead of having to open a socket to talk to the DataNode, the former being generally much faster.
+See JD's link:http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf[Performance Talk].
+Also see link:http://search-hadoop.com/m/zV6dKrLCVh1[HBase, mail # dev - read short circuit] thread for more discussion around short circuit reads.
+
+To enable "short circuit" reads, it will depend on your version of Hadoop.
+The original shortcircuit read patch was much improved upon in Hadoop 2 in link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347].
+See http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/ for details on the difference between the old and new implementations.
+See link:http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html[Hadoop shortcircuit reads configuration page] for how to enable the latter, better version of shortcircuit.
+For example, here is a minimal config.
+enabling short-circuit reads added to _hbase-site.xml_:
+
+[source,xml]
+----
+<property>
+  <name>dfs.client.read.shortcircuit</name>
+  <value>true</value>
+  <description>
+    This configuration parameter turns on short-circuit local reads.
+  </description>
+</property>
+<property>
+  <name>dfs.domain.socket.path</name>
+  <value>/home/stack/sockets/short_circuit_read_socket_PORT</value>
+  <description>
+    Optional.  This is a path to a UNIX domain socket that will be used for
+    communication between the DataNode and local HDFS clients.
+    If the string "_PORT" is present in this path, it will be replaced by the
+    TCP port of the DataNode.
+  </description>
+</property>
+----
+
+Be careful about permissions for the directory that hosts the shared domain socket; dfsclient will complain if open to other than the hbase user.
+
+If you are running on an old Hadoop, one that is without link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347] but that has link:https://issues.apache.org/jira/browse/HDFS-2246[HDFS-2246], you must set two configurations.
+First, the hdfs-site.xml needs to be amended.
+Set the property `dfs.block.local-path-access.user` to be the _only_ user that can use the shortcut.
+This has to be the user that started HBase.
+Then in hbase-site.xml, set `dfs.client.read.shortcircuit` to be `true`
+
+Services -- at least the HBase RegionServers -- will need to be restarted in order to pick up the new configurations.
+
+.dfs.client.read.shortcircuit.buffer.size
+[NOTE]
+====
+The default for this value is too high when running on a highly trafficked HBase.
+In HBase, if this value has not been set, we set it down from the default of 1M to 128k (Since HBase 0.98.0 and 0.96.1). See link:https://issues.apache.org/jira/browse/HBASE-8143[HBASE-8143 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM]). The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for _each_ block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.
+====
+
+[[perf.hdfs.comp]]
+=== Performance Comparisons of HBase vs. HDFS
+
+A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this processing context.
+There is room for improvement and this gap will, over time, be reduced, but HDFS will always be faster in this use-case.
+
+[[perf.ec2]]
+== Amazon EC2
+
+Performance questions are common on Amazon EC2 environments because it is a shared environment.
+You will not see the same throughput as a dedicated server.
+In terms of running tests on EC2, run them several times for the same reason (i.e., it's a shared environment and you don't know what else is happening on the server).
+
+If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that because EC2 issues are practically a separate class of performance issues.
+
+[[perf.hbase.mr.cluster]]
+== Collocating HBase and MapReduce
+
+It is often recommended to have different clusters for HBase and MapReduce.
+A better qualification of this is: don't collocate a HBase that serves live requests with a heavy MR workload.
+OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other, usually the former.
+For example, short latency-sensitive disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as possible.
+MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate blocks in the <<block.cache>>.
+
+If you need to process the data from your live HBase cluster in MR, you can ship the deltas with <<copy.table>> or use replication to get the new data in real time on the OLAP cluster.
+In the worst case, if you really need to collocate both, set MR to use less Map and Reduce slots than you'd normally configure, possibly just one.
+
+When HBase is used for OLAP operations, it's preferable to set it up in a hardened way like configuring the ZooKeeper session timeout higher and giving more memory to the MemStores (the argument being that the Block Cache won't be used much since the workloads are usually long scans).
+
+[[perf.casestudy]]
+== Case Studies
+
+For Performance and Troubleshooting Case Studies, see <<casestudies>>.
+
+ifdef::backend-docbook[]
+[index]
+== Index
+// Generated automatically by the DocBook toolchain.
+endif::backend-docbook[]

http://git-wip-us.apache.org/repos/asf/hbase/blob/33fe79cf/src/main/asciidoc/_chapters/preface.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/preface.adoc b/src/main/asciidoc/_chapters/preface.adoc
new file mode 100644
index 0000000..960fcc4
--- /dev/null
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -0,0 +1,64 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[preface]
+= Preface
+:doctype: article
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+This is the official reference guide for the link:http://hbase.apache.org/[HBase] version it ships with.
+
+Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location in link:http://hbase.apache.org/apidocs/index.html[Javadoc], link:https://issues.apache.org/jira/browse/HBASE[JIRA] or link:http://wiki.apache.org/hadoop/Hbase[wiki] where the pertinent information can be found.
+
+.About This Guide
+This reference guide is a work in progress. The source for this guide can be found in the _src/main/asciidoc directory of the HBase source. This reference guide is marked up using link:http://asciidoc.org/[AsciiDoc] from which the finished guide is generated as part of the 'site' build target. Run
+[source,bourne]
+----
+mvn site
+----
+to generate this documentation.
+Amendments and improvements to the documentation are welcomed.
+Click link:https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12310753&issuetype=1&components=12312132&summary=SHORT+DESCRIPTION[this link] to file a new documentation bug against Apache HBase with some values pre-selected.
+
+.Contributing to the Documentation
+For an overview of AsciiDoc and suggestions to get started contributing to the documentation, see the <<appendix_contributing_to_documentation,relevant section later in this documentation>>.
+
+.Heads-up if this is your first foray into the world of distributed computing...
+If this is your first foray into the wonderful world of Distributed Computing, then you are in for some interesting times.
+First off, distributed systems are hard; making a distributed system hum requires a disparate skillset that spans systems (hardware and software) and networking.
+
+Your cluster's operation can hiccup because of any of a myriad set of reasons from bugs in HBase itself through misconfigurations -- misconfiguration of HBase but also operating system misconfigurations -- through to hardware problems whether it be a bug in your network card drivers or an underprovisioned RAM bus (to mention two recent examples of hardware issues that manifested as "HBase is slow"). You will also need to do a recalibration if up to this your computing has been bound to a single box.
+Here is one good starting point: link:http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing[Fallacies of Distributed Computing].
+
+That said, you are welcome. +
+It's a fun place to be. +
+Yours, the HBase Community.
+
+.Reporting Bugs
+
+Please use link:https://issues.apache.org/jira/browse/hbase[JIRA] to report non-security-related bugs. 
+
+To protect existing HBase installations from new vulnerabilities, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report.
+
+:numbered:

http://git-wip-us.apache.org/repos/asf/hbase/blob/33fe79cf/src/main/asciidoc/_chapters/rpc.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/rpc.adoc b/src/main/asciidoc/_chapters/rpc.adoc
new file mode 100644
index 0000000..43e7156
--- /dev/null
+++ b/src/main/asciidoc/_chapters/rpc.adoc
@@ -0,0 +1,222 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[appendix]
+[[hbase.rpc]]
+== 0.95 RPC Specification
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop
+            Writables].
+Our RPC wire format therefore changes.
+This document describes the client/server request/response protocol and our new RPC wire-format.
+
+
+
+For what RPC is like in 0.94 and previous, see Benoît/Tsuna's link:https://github.com/OpenTSDB/asynchbase/blob/master/src/HBaseRpc.java#L164[Unofficial
+            Hadoop / HBase RPC protocol documentation].
+For more background on how we arrived at this spec., see link:https://docs.google.com/document/d/1WCKwgaLDqBw2vpux0jPsAu2WPTRISob7HGCO8YhfDTA/edit#[HBase
+            RPC: WIP]
+
+
+
+=== Goals
+
+
+
+. A wire-format we can evolve
+. A format that does not require our rewriting server core or radically changing its current architecture (for later).        
+
+=== TODO
+
+
+
+. List of problems with currently specified format and where we would like to go in a version2, etc.
+  For example, what would we have to change if anything to move server async or to support streaming/chunking?
+. Diagram on how it works
+. A grammar that succinctly describes the wire-format.
+  Currently we have these words and the content of the rpc protobuf idl but a grammar for the back and forth would help with groking rpc.
+  Also, a little state machine on client/server interactions would help with understanding (and ensuring correct implementation).        
+
+=== RPC
+
+The client will send setup information on connection establish.
+Thereafter, the client invokes methods against the remote server sending a protobuf Message and receiving a protobuf Message in response.
+Communication is synchronous.
+All back and forth is preceded by an int that has the total length of the request/response.
+Optionally, Cells(KeyValues) can be passed outside of protobufs in follow-behind Cell blocks (because link:https://docs.google.com/document/d/1WEtrq-JTIUhlnlnvA0oYRLp0F8MKpEBeBSCFcQiacdw/edit#[we
+                can't protobuf megabytes of KeyValues] or Cells). These CellBlocks are encoded and optionally compressed.
+
+
+
+For more detail on the protobufs involved, see the link:http://svn.apache.org/viewvc/hbase/trunk/hbase-protocol/src/main/protobuf/RPC.proto?view=markup[RPC.proto]            file in trunk.
+
+==== Connection Setup
+
+Client initiates connection.
+
+===== Client
+On connection setup, client sends a preamble followed by a connection header. 
+
+.<preamble>
+[source]
+----
+<MAGIC 4 byte integer> <1 byte RPC Format Version> <1 byte auth type>
+----
+
+We need the auth method spec.
+here so the connection header is encoded if auth enabled.
+
+E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- `HBas' -- plus one-byte of version, 0 in this case, and one byte, 0x50 (SIMPLE). of an auth type.
+
+.<Protobuf ConnectionHeader Message>
+Has user info, and ``protocol'', as well as the encoders and compression the client will use sending CellBlocks.
+CellBlock encoders and compressors are for the life of the connection.
+CellBlock encoders implement org.apache.hadoop.hbase.codec.Codec.
+CellBlocks may then also be compressed.
+Compressors implement org.apache.hadoop.io.compress.CompressionCodec.
+This protobuf is written using writeDelimited so is prefaced by a pb varint with its serialized length
+
+===== Server
+
+After client sends preamble and connection header, server does NOT respond if successful connection setup.
+No response means server is READY to accept requests and to give out response.
+If the version or authentication in the preamble is not agreeable or the server has trouble parsing the preamble, it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the error and will then disconnect.
+If the client in the connection header -- i.e.
+the protobuf'd Message that comes after the connection preamble -- asks for for a Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.
+
+==== Request
+
+After a Connection has been set up, client makes requests.
+Server responds.
+
+A request is made up of a protobuf RequestHeader followed by a protobuf Message parameter.
+The header includes the method name and optionally, metadata on the optional CellBlock that may be following.
+The parameter type suits the method being invoked: i.e.
+if we are doing a getRegionInfo request, the protobuf Message param will be an instance of GetRegionInfoRequest.
+The response will be a GetRegionInfoResponse.
+The CellBlock is optionally used ferrying the bulk of the RPC data: i.e Cells/KeyValues.
+
+===== Request Parts
+
+.<Total Length>
+The request is prefaced by an int that holds the total length of what follows.
+
+.<Protobuf RequestHeader Message>
+Will have call.id, trace.id, and method name, etc.
+including optional Metadata on the Cell block IFF one is following.
+Data is protobuf'd inline in this pb Message or optionally comes in the following CellBlock
+
+.<Protobuf Param Message>
+If the method being invoked is getRegionInfo, if you study the Service descriptor for the client to regionserver protocol, you will find that the request sends a GetRegionInfoRequest protobuf Message param in this position.
+
+.<CellBlock>
+An encoded and optionally compressed Cell block.
+
+==== Response
+
+Same as Request, it is a protobuf ResponseHeader followed by a protobuf Message response where the Message response type suits the method invoked.
+Bulk of the data may come in a following CellBlock.
+
+===== Response Parts
+
+.<Total Length>
+The response is prefaced by an int that holds the total length of what follows.
+
+.<Protobuf ResponseHeader Message>
+Will have call.id, etc.
+Will include exception if failed processing.
+Optionally includes metadata on optional, IFF there is a CellBlock following.
+
+.<Protobuf Response Message>
+
+Return or may be nothing if exception.
+If the method being invoked is getRegionInfo, if you study the Service descriptor for the client to regionserver protocol, you will find that the response sends a GetRegionInfoResponse protobuf Message param in this position.
+
+.<CellBlock>
+
+An encoded and optionally compressed Cell block.
+
+==== Exceptions
+
+There are two distinct types.
+There is the request failed which is encapsulated inside the response header for the response.
+The connection stays open to receive new requests.
+The second type, the FatalConnectionException, kills the connection.
+
+Exceptions can carry extra information.
+See the ExceptionResponse protobuf type.
+It has a flag to indicate do-no-retry as well as other miscellaneous payload to help improve client responsiveness.
+
+==== CellBlocks
+
+These are not versioned.
+Server can do the codec or it cannot.
+If new version of a codec with say, tighter encoding, then give it a new class name.
+Codecs will live on the server for all time so old clients can connect.
+
+=== Notes
+
+.Constraints
+In some part, current wire-format -- i.e.
+all requests and responses preceeded by a length -- has been dictated by current server non-async architecture.
+
+.One fat pb request or header+param
+We went with pb header followed by pb param making a request and a pb header followed by pb response for now.
+Doing header+param rather than a single protobuf Message with both header and param content:
+
+. Is closer to what we currently have
+. Having a single fat pb requires extra copying putting the already pb'd param into the body of the fat request pb (and same making result)
+. We can decide whether to accept the request or not before we read the param; for example, the request might be low priority.
+  As is, we read header+param in one go as server is currently implemented so this is a TODO.            
+
+The advantages are minor.
+If later, fat request has clear advantage, can roll out a v2 later.
+
+[[rpc.configs]]
+==== RPC Configurations
+
+.CellBlock Codecs
+To enable a codec other than the default `KeyValueCodec`, set `hbase.client.rpc.codec` to the name of the Codec class to use.
+Codec must implement hbase's `Codec` Interface.
+After connection setup, all passed cellblocks will be sent with this codec.
+The server will return cellblocks using this same codec as long as the codec is on the servers' CLASSPATH (else you will get `UnsupportedCellCodecException`).
+
+To change the default codec, set `hbase.client.default.rpc.codec`. 
+
+To disable cellblocks completely and to go pure protobuf, set the default to the empty String and do not specify a codec in your Configuration.
+So, set `hbase.client.default.rpc.codec` to the empty string and do not set `hbase.client.rpc.codec`.
+This will cause the client to connect to the server with no codec specified.
+If a server sees no codec, it will return all responses in pure protobuf.
+Running pure protobuf all the time will be slower than running with cellblocks. 
+
+.Compression
+Uses hadoops compression codecs.
+To enable compressing of passed CellBlocks, set `hbase.client.rpc.compressor` to the name of the Compressor to use.
+Compressor must implement Hadoops' CompressionCodec Interface.
+After connection setup, all passed cellblocks will be sent compressed.
+The server will return cellblocks compressed using this same compressor as long as the compressor is on its CLASSPATH (else you will get `UnsupportedCompressionCodecException`).
+
+:numbered: