You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by mi...@apache.org on 2015/12/23 00:25:13 UTC

[50/51] [partial] hbase-site git commit: Published site at 95a13b51ee052eb73882682e8f009bfa1e914866.

http://git-wip-us.apache.org/repos/asf/hbase-site/blob/32d40534/book.html
----------------------------------------------------------------------
diff --git a/book.html b/book.html
index 78d7fbc..f4a3fc4 100644
--- a/book.html
+++ b/book.html
@@ -77,193 +77,199 @@
 <li><a href="#schema">HBase and Schema Design</a>
 <ul class="sectlevel1">
 <li><a href="#schema.creation">32. Schema Creation</a></li>
-<li><a href="#number.of.cfs">33. On the number of column families</a></li>
-<li><a href="#rowkey.design">34. Rowkey Design</a></li>
-<li><a href="#schema.versions">35. Number of Versions</a></li>
-<li><a href="#supported.datatypes">36. Supported Datatypes</a></li>
-<li><a href="#schema.joins">37. Joins</a></li>
-<li><a href="#ttl">38. Time To Live (TTL)</a></li>
-<li><a href="#cf.keep.deleted">39. Keeping Deleted Cells</a></li>
-<li><a href="#secondary.indexes">40. Secondary Indexes and Alternate Query Paths</a></li>
-<li><a href="#_constraints">41. Constraints</a></li>
-<li><a href="#schema.casestudies">42. Schema Design Case Studies</a></li>
-<li><a href="#schema.ops">43. Operational and Performance Configuration Options</a></li>
+<li><a href="#table_schema_rules_of_thumb">33. Table Schema Rules Of Thumb</a></li>
+</ul>
+</li>
+<li><a href="#regionserver_sizing_rules_of_thumb">RegionServer Sizing Rules of Thumb</a>
+<ul class="sectlevel1">
+<li><a href="#number.of.cfs">34. On the number of column families</a></li>
+<li><a href="#rowkey.design">35. Rowkey Design</a></li>
+<li><a href="#schema.versions">36. Number of Versions</a></li>
+<li><a href="#supported.datatypes">37. Supported Datatypes</a></li>
+<li><a href="#schema.joins">38. Joins</a></li>
+<li><a href="#ttl">39. Time To Live (TTL)</a></li>
+<li><a href="#cf.keep.deleted">40. Keeping Deleted Cells</a></li>
+<li><a href="#secondary.indexes">41. Secondary Indexes and Alternate Query Paths</a></li>
+<li><a href="#_constraints">42. Constraints</a></li>
+<li><a href="#schema.casestudies">43. Schema Design Case Studies</a></li>
+<li><a href="#schema.ops">44. Operational and Performance Configuration Options</a></li>
 </ul>
 </li>
 <li><a href="#mapreduce">HBase and MapReduce</a>
 <ul class="sectlevel1">
-<li><a href="#hbase.mapreduce.classpath">44. HBase, MapReduce, and the CLASSPATH</a></li>
-<li><a href="#_mapreduce_scan_caching">45. MapReduce Scan Caching</a></li>
-<li><a href="#_bundled_hbase_mapreduce_jobs">46. Bundled HBase MapReduce Jobs</a></li>
-<li><a href="#_hbase_as_a_mapreduce_job_data_source_and_data_sink">47. HBase as a MapReduce Job Data Source and Data Sink</a></li>
-<li><a href="#_writing_hfiles_directly_during_bulk_import">48. Writing HFiles Directly During Bulk Import</a></li>
-<li><a href="#_rowcounter_example">49. RowCounter Example</a></li>
-<li><a href="#splitter">50. Map-Task Splitting</a></li>
-<li><a href="#mapreduce.example">51. HBase MapReduce Examples</a></li>
-<li><a href="#mapreduce.htable.access">52. Accessing Other HBase Tables in a MapReduce Job</a></li>
-<li><a href="#mapreduce.specex">53. Speculative Execution</a></li>
-<li><a href="#cascading">54. Cascading</a></li>
+<li><a href="#hbase.mapreduce.classpath">45. HBase, MapReduce, and the CLASSPATH</a></li>
+<li><a href="#_mapreduce_scan_caching">46. MapReduce Scan Caching</a></li>
+<li><a href="#_bundled_hbase_mapreduce_jobs">47. Bundled HBase MapReduce Jobs</a></li>
+<li><a href="#_hbase_as_a_mapreduce_job_data_source_and_data_sink">48. HBase as a MapReduce Job Data Source and Data Sink</a></li>
+<li><a href="#_writing_hfiles_directly_during_bulk_import">49. Writing HFiles Directly During Bulk Import</a></li>
+<li><a href="#_rowcounter_example">50. RowCounter Example</a></li>
+<li><a href="#splitter">51. Map-Task Splitting</a></li>
+<li><a href="#mapreduce.example">52. HBase MapReduce Examples</a></li>
+<li><a href="#mapreduce.htable.access">53. Accessing Other HBase Tables in a MapReduce Job</a></li>
+<li><a href="#mapreduce.specex">54. Speculative Execution</a></li>
+<li><a href="#cascading">55. Cascading</a></li>
 </ul>
 </li>
 <li><a href="#security">Securing Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#_using_secure_http_https_for_the_web_ui">55. Using Secure HTTP (HTTPS) for the Web UI</a></li>
-<li><a href="#hbase.secure.configuration">56. Secure Client Access to Apache HBase</a></li>
-<li><a href="#hbase.secure.simpleconfiguration">57. Simple User Access to Apache HBase</a></li>
-<li><a href="#_securing_access_to_hdfs_and_zookeeper">58. Securing Access to HDFS and ZooKeeper</a></li>
-<li><a href="#_securing_access_to_your_data">59. Securing Access To Your Data</a></li>
-<li><a href="#security.example.config">60. Security Configuration Example</a></li>
+<li><a href="#_using_secure_http_https_for_the_web_ui">56. Using Secure HTTP (HTTPS) for the Web UI</a></li>
+<li><a href="#hbase.secure.configuration">57. Secure Client Access to Apache HBase</a></li>
+<li><a href="#hbase.secure.simpleconfiguration">58. Simple User Access to Apache HBase</a></li>
+<li><a href="#_securing_access_to_hdfs_and_zookeeper">59. Securing Access to HDFS and ZooKeeper</a></li>
+<li><a href="#_securing_access_to_your_data">60. Securing Access To Your Data</a></li>
+<li><a href="#security.example.config">61. Security Configuration Example</a></li>
 </ul>
 </li>
 <li><a href="#_architecture">Architecture</a>
 <ul class="sectlevel1">
-<li><a href="#arch.overview">61. Overview</a></li>
-<li><a href="#arch.catalog">62. Catalog Tables</a></li>
-<li><a href="#architecture.client">63. Client</a></li>
-<li><a href="#client.filter">64. Client Request Filters</a></li>
-<li><a href="#_master">65. Master</a></li>
-<li><a href="#regionserver.arch">66. RegionServer</a></li>
-<li><a href="#regions.arch">67. Regions</a></li>
-<li><a href="#arch.bulk.load">68. Bulk Loading</a></li>
-<li><a href="#arch.hdfs">69. HDFS</a></li>
-<li><a href="#arch.timelineconsistent.reads">70. Timeline-consistent High Available Reads</a></li>
-<li><a href="#hbase_mob">71. Storing Medium-sized Objects (MOB)</a></li>
+<li><a href="#arch.overview">62. Overview</a></li>
+<li><a href="#arch.catalog">63. Catalog Tables</a></li>
+<li><a href="#architecture.client">64. Client</a></li>
+<li><a href="#client.filter">65. Client Request Filters</a></li>
+<li><a href="#_master">66. Master</a></li>
+<li><a href="#regionserver.arch">67. RegionServer</a></li>
+<li><a href="#regions.arch">68. Regions</a></li>
+<li><a href="#arch.bulk.load">69. Bulk Loading</a></li>
+<li><a href="#arch.hdfs">70. HDFS</a></li>
+<li><a href="#arch.timelineconsistent.reads">71. Timeline-consistent High Available Reads</a></li>
+<li><a href="#hbase_mob">72. Storing Medium-sized Objects (MOB)</a></li>
 </ul>
 </li>
 <li><a href="#hbase_apis">Apache HBase APIs</a>
 <ul class="sectlevel1">
-<li><a href="#_examples">72. Examples</a></li>
+<li><a href="#_examples">73. Examples</a></li>
 </ul>
 </li>
 <li><a href="#external_apis">Apache HBase External APIs</a>
 <ul class="sectlevel1">
-<li><a href="#_rest">73. REST</a></li>
-<li><a href="#_thrift">74. Thrift</a></li>
-<li><a href="#c">75. C/C++ Apache HBase Client</a></li>
-<li><a href="#jdo">76. Using Java Data Objects (JDO) with HBase</a></li>
-<li><a href="#scala">77. Scala</a></li>
-<li><a href="#jython">78. Jython</a></li>
+<li><a href="#_rest">74. REST</a></li>
+<li><a href="#_thrift">75. Thrift</a></li>
+<li><a href="#c">76. C/C++ Apache HBase Client</a></li>
+<li><a href="#jdo">77. Using Java Data Objects (JDO) with HBase</a></li>
+<li><a href="#scala">78. Scala</a></li>
+<li><a href="#jython">79. Jython</a></li>
 </ul>
 </li>
 <li><a href="#thrift">Thrift API and Filter Language</a>
 <ul class="sectlevel1">
-<li><a href="#thrift.filter_language">79. Filter Language</a></li>
+<li><a href="#thrift.filter_language">80. Filter Language</a></li>
 </ul>
 </li>
 <li><a href="#spark">HBase and Spark</a>
 <ul class="sectlevel1">
-<li><a href="#_basic_spark">80. Basic Spark</a></li>
-<li><a href="#_spark_streaming">81. Spark Streaming</a></li>
-<li><a href="#_bulk_load">82. Bulk Load</a></li>
-<li><a href="#_sparksql_dataframes">83. SparkSQL/DataFrames</a></li>
+<li><a href="#_basic_spark">81. Basic Spark</a></li>
+<li><a href="#_spark_streaming">82. Spark Streaming</a></li>
+<li><a href="#_bulk_load">83. Bulk Load</a></li>
+<li><a href="#_sparksql_dataframes">84. SparkSQL/DataFrames</a></li>
 </ul>
 </li>
 <li><a href="#cp">Apache HBase Coprocessors</a>
 <ul class="sectlevel1">
-<li><a href="#_coprocessor_framework">84. Coprocessor Framework</a></li>
-<li><a href="#_types_of_coprocessors">85. Types of Coprocessors</a></li>
-<li><a href="#cp_loading">86. Loading Coprocessors</a></li>
-<li><a href="#cp_example">87. Examples</a></li>
-<li><a href="#_monitor_time_spent_in_coprocessors">88. Monitor Time Spent in Coprocessors</a></li>
+<li><a href="#_coprocessor_overview">85. Coprocessor Overview</a></li>
+<li><a href="#_types_of_coprocessors">86. Types of Coprocessors</a></li>
+<li><a href="#cp_loading">87. Loading Coprocessors</a></li>
+<li><a href="#cp_example">88. Examples</a></li>
+<li><a href="#_guidelines_for_deploying_a_coprocessor">89. Guidelines For Deploying A Coprocessor</a></li>
+<li><a href="#_monitor_time_spent_in_coprocessors">90. Monitor Time Spent in Coprocessors</a></li>
 </ul>
 </li>
 <li><a href="#performance">Apache HBase Performance Tuning</a>
 <ul class="sectlevel1">
-<li><a href="#perf.os">89. Operating System</a></li>
-<li><a href="#perf.network">90. Network</a></li>
-<li><a href="#jvm">91. Java</a></li>
-<li><a href="#perf.configurations">92. HBase Configurations</a></li>
-<li><a href="#perf.zookeeper">93. ZooKeeper</a></li>
-<li><a href="#perf.schema">94. Schema Design</a></li>
-<li><a href="#perf.general">95. HBase General Patterns</a></li>
-<li><a href="#perf.writing">96. Writing to HBase</a></li>
-<li><a href="#perf.reading">97. Reading from HBase</a></li>
-<li><a href="#perf.deleting">98. Deleting from HBase</a></li>
-<li><a href="#perf.hdfs">99. HDFS</a></li>
-<li><a href="#perf.ec2">100. Amazon EC2</a></li>
-<li><a href="#perf.hbase.mr.cluster">101. Collocating HBase and MapReduce</a></li>
-<li><a href="#perf.casestudy">102. Case Studies</a></li>
+<li><a href="#perf.os">91. Operating System</a></li>
+<li><a href="#perf.network">92. Network</a></li>
+<li><a href="#jvm">93. Java</a></li>
+<li><a href="#perf.configurations">94. HBase Configurations</a></li>
+<li><a href="#perf.zookeeper">95. ZooKeeper</a></li>
+<li><a href="#perf.schema">96. Schema Design</a></li>
+<li><a href="#perf.general">97. HBase General Patterns</a></li>
+<li><a href="#perf.writing">98. Writing to HBase</a></li>
+<li><a href="#perf.reading">99. Reading from HBase</a></li>
+<li><a href="#perf.deleting">100. Deleting from HBase</a></li>
+<li><a href="#perf.hdfs">101. HDFS</a></li>
+<li><a href="#perf.ec2">102. Amazon EC2</a></li>
+<li><a href="#perf.hbase.mr.cluster">103. Collocating HBase and MapReduce</a></li>
+<li><a href="#perf.casestudy">104. Case Studies</a></li>
 </ul>
 </li>
 <li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#trouble.general">103. General Guidelines</a></li>
-<li><a href="#trouble.log">104. Logs</a></li>
-<li><a href="#trouble.resources">105. Resources</a></li>
-<li><a href="#trouble.tools">106. Tools</a></li>
-<li><a href="#trouble.client">107. Client</a></li>
-<li><a href="#trouble.mapreduce">108. MapReduce</a></li>
-<li><a href="#trouble.namenode">109. NameNode</a></li>
-<li><a href="#trouble.network">110. Network</a></li>
-<li><a href="#trouble.rs">111. RegionServer</a></li>
-<li><a href="#trouble.master">112. Master</a></li>
-<li><a href="#trouble.zookeeper">113. ZooKeeper</a></li>
-<li><a href="#trouble.ec2">114. Amazon EC2</a></li>
-<li><a href="#trouble.versions">115. HBase and Hadoop version issues</a></li>
-<li><a href="#_ipc_configuration_conflicts_with_hadoop">116. IPC Configuration Conflicts with Hadoop</a></li>
-<li><a href="#_hbase_and_hdfs">117. HBase and HDFS</a></li>
-<li><a href="#trouble.tests">118. Running unit or integration tests</a></li>
-<li><a href="#trouble.casestudy">119. Case Studies</a></li>
-<li><a href="#trouble.crypto">120. Cryptographic Features</a></li>
-<li><a href="#_operating_system_specific_issues">121. Operating System Specific Issues</a></li>
-<li><a href="#_jdk_issues">122. JDK Issues</a></li>
+<li><a href="#trouble.general">105. General Guidelines</a></li>
+<li><a href="#trouble.log">106. Logs</a></li>
+<li><a href="#trouble.resources">107. Resources</a></li>
+<li><a href="#trouble.tools">108. Tools</a></li>
+<li><a href="#trouble.client">109. Client</a></li>
+<li><a href="#trouble.mapreduce">110. MapReduce</a></li>
+<li><a href="#trouble.namenode">111. NameNode</a></li>
+<li><a href="#trouble.network">112. Network</a></li>
+<li><a href="#trouble.rs">113. RegionServer</a></li>
+<li><a href="#trouble.master">114. Master</a></li>
+<li><a href="#trouble.zookeeper">115. ZooKeeper</a></li>
+<li><a href="#trouble.ec2">116. Amazon EC2</a></li>
+<li><a href="#trouble.versions">117. HBase and Hadoop version issues</a></li>
+<li><a href="#_ipc_configuration_conflicts_with_hadoop">118. IPC Configuration Conflicts with Hadoop</a></li>
+<li><a href="#_hbase_and_hdfs">119. HBase and HDFS</a></li>
+<li><a href="#trouble.tests">120. Running unit or integration tests</a></li>
+<li><a href="#trouble.casestudy">121. Case Studies</a></li>
+<li><a href="#trouble.crypto">122. Cryptographic Features</a></li>
+<li><a href="#_operating_system_specific_issues">123. Operating System Specific Issues</a></li>
+<li><a href="#_jdk_issues">124. JDK Issues</a></li>
 </ul>
 </li>
 <li><a href="#casestudies">Apache HBase Case Studies</a>
 <ul class="sectlevel1">
-<li><a href="#casestudies.overview">123. Overview</a></li>
-<li><a href="#casestudies.schema">124. Schema Design</a></li>
-<li><a href="#casestudies.perftroub">125. Performance/Troubleshooting</a></li>
+<li><a href="#casestudies.overview">125. Overview</a></li>
+<li><a href="#casestudies.schema">126. Schema Design</a></li>
+<li><a href="#casestudies.perftroub">127. Performance/Troubleshooting</a></li>
 </ul>
 </li>
 <li><a href="#ops_mgt">Apache HBase Operational Management</a>
 <ul class="sectlevel1">
-<li><a href="#tools">126. HBase Tools and Utilities</a></li>
-<li><a href="#ops.regionmgt">127. Region Management</a></li>
-<li><a href="#node.management">128. Node Management</a></li>
-<li><a href="#_hbase_metrics">129. HBase Metrics</a></li>
-<li><a href="#ops.monitoring">130. HBase Monitoring</a></li>
-<li><a href="#_cluster_replication">131. Cluster Replication</a></li>
-<li><a href="#_running_multiple_workloads_on_a_single_cluster">132. Running Multiple Workloads On a Single Cluster</a></li>
-<li><a href="#ops.backup">133. HBase Backup</a></li>
-<li><a href="#ops.snapshots">134. HBase Snapshots</a></li>
-<li><a href="#ops.capacity">135. Capacity Planning and Region Sizing</a></li>
-<li><a href="#table.rename">136. Table Rename</a></li>
+<li><a href="#tools">128. HBase Tools and Utilities</a></li>
+<li><a href="#ops.regionmgt">129. Region Management</a></li>
+<li><a href="#node.management">130. Node Management</a></li>
+<li><a href="#_hbase_metrics">131. HBase Metrics</a></li>
+<li><a href="#ops.monitoring">132. HBase Monitoring</a></li>
+<li><a href="#_cluster_replication">133. Cluster Replication</a></li>
+<li><a href="#_running_multiple_workloads_on_a_single_cluster">134. Running Multiple Workloads On a Single Cluster</a></li>
+<li><a href="#ops.backup">135. HBase Backup</a></li>
+<li><a href="#ops.snapshots">136. HBase Snapshots</a></li>
+<li><a href="#ops.capacity">137. Capacity Planning and Region Sizing</a></li>
+<li><a href="#table.rename">138. Table Rename</a></li>
 </ul>
 </li>
 <li><a href="#developer">Building and Developing Apache HBase</a>
 <ul class="sectlevel1">
-<li><a href="#getting.involved">137. Getting Involved</a></li>
-<li><a href="#repos">138. Apache HBase Repositories</a></li>
-<li><a href="#_ides">139. IDEs</a></li>
-<li><a href="#build">140. Building Apache HBase</a></li>
-<li><a href="#releasing">141. Releasing Apache HBase</a></li>
-<li><a href="#hbase.rc.voting">142. Voting on Release Candidates</a></li>
-<li><a href="#documentation">143. Generating the HBase Reference Guide</a></li>
-<li><a href="#hbase.org">144. Updating <a href="http://hbase.apache.org">hbase.apache.org</a></a></li>
-<li><a href="#hbase.tests">145. Tests</a></li>
-<li><a href="#developing">146. Developer Guidelines</a></li>
+<li><a href="#getting.involved">139. Getting Involved</a></li>
+<li><a href="#repos">140. Apache HBase Repositories</a></li>
+<li><a href="#_ides">141. IDEs</a></li>
+<li><a href="#build">142. Building Apache HBase</a></li>
+<li><a href="#releasing">143. Releasing Apache HBase</a></li>
+<li><a href="#hbase.rc.voting">144. Voting on Release Candidates</a></li>
+<li><a href="#documentation">145. Generating the HBase Reference Guide</a></li>
+<li><a href="#hbase.org">146. Updating <a href="http://hbase.apache.org">hbase.apache.org</a></a></li>
+<li><a href="#hbase.tests">147. Tests</a></li>
+<li><a href="#developing">148. Developer Guidelines</a></li>
 </ul>
 </li>
 <li><a href="#unit.tests">Unit Testing HBase Applications</a>
 <ul class="sectlevel1">
-<li><a href="#_junit">147. JUnit</a></li>
-<li><a href="#_mockito">148. Mockito</a></li>
-<li><a href="#_mrunit">149. MRUnit</a></li>
-<li><a href="#_integration_testing_with_an_hbase_mini_cluster">150. Integration Testing with an HBase Mini-Cluster</a></li>
+<li><a href="#_junit">149. JUnit</a></li>
+<li><a href="#_mockito">150. Mockito</a></li>
+<li><a href="#_mrunit">151. MRUnit</a></li>
+<li><a href="#_integration_testing_with_an_hbase_mini_cluster">152. Integration Testing with an HBase Mini-Cluster</a></li>
 </ul>
 </li>
 <li><a href="#zookeeper">ZooKeeper</a>
 <ul class="sectlevel1">
-<li><a href="#_using_existing_zookeeper_ensemble">151. Using existing ZooKeeper ensemble</a></li>
-<li><a href="#zk.sasl.auth">152. SASL Authentication with ZooKeeper</a></li>
+<li><a href="#_using_existing_zookeeper_ensemble">153. Using existing ZooKeeper ensemble</a></li>
+<li><a href="#zk.sasl.auth">154. SASL Authentication with ZooKeeper</a></li>
 </ul>
 </li>
 <li><a href="#community">Community</a>
 <ul class="sectlevel1">
-<li><a href="#_decisions">153. Decisions</a></li>
-<li><a href="#community.roles">154. Community Roles</a></li>
-<li><a href="#hbase.commit.msg.format">155. Commit Message format</a></li>
+<li><a href="#_decisions">155. Decisions</a></li>
+<li><a href="#community.roles">156. Community Roles</a></li>
+<li><a href="#hbase.commit.msg.format">157. Commit Message format</a></li>
 </ul>
 </li>
 <li><a href="#_appendix">Appendix</a>
@@ -273,7 +279,7 @@
 <li><a href="#hbck.in.depth">Appendix C: hbck In Depth</a></li>
 <li><a href="#appendix_acl_matrix">Appendix D: Access Control Matrix</a></li>
 <li><a href="#compression">Appendix E: Compression and Data Block Encoding In HBase</a></li>
-<li><a href="#data.block.encoding.enable">156. Enable Data Block Encoding</a></li>
+<li><a href="#data.block.encoding.enable">158. Enable Data Block Encoding</a></li>
 <li><a href="#sql">Appendix F: SQL over HBase</a></li>
 <li><a href="#_ycsb">Appendix G: YCSB</a></li>
 <li><a href="#_hfile_format_2">Appendix H: HFile format</a></li>
@@ -282,8 +288,8 @@
 <li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li>
 <li><a href="#orca">Appendix L: Apache HBase Orca</a></li>
 <li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li>
-<li><a href="#tracing.client.modifications">157. Client Modifications</a></li>
-<li><a href="#tracing.client.shell">158. Tracing from HBase Shell</a></li>
+<li><a href="#tracing.client.modifications">159. Client Modifications</a></li>
+<li><a href="#tracing.client.shell">160. Tracing from HBase Shell</a></li>
 <li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li>
 </ul>
 </li>
@@ -7795,7 +7801,80 @@ online schema changes are supported in the 0.92.x codebase, but the 0.90.x codeb
 </div>
 </div>
 <div class="sect1">
-<h2 id="number.of.cfs"><a class="anchor" href="#number.of.cfs"></a>33. On the number of column families</h2>
+<h2 id="table_schema_rules_of_thumb"><a class="anchor" href="#table_schema_rules_of_thumb"></a>33. Table Schema Rules Of Thumb</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>There are many different data sets, with different access patterns and service-level
+expectations. Therefore, these rules of thumb are only an overview. Read the rest
+of this chapter to get more details after you have gone through this list.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Aim to have regions sized between 10 and 50 GB.</p>
+</li>
+<li>
+<p>Aim to have cells no larger than 10 MB, or 50 MB if you use <a href="#mob">[mob]</a>. Otherwise,
+consider storing your cell data in HDFS and store a pointer to the data in HBase.</p>
+</li>
+<li>
+<p>A typical schema has between 1 and 3 column families per table. HBase tables should
+not be designed to mimic RDBMS tables.</p>
+</li>
+<li>
+<p>Around 50-100 regions is a good number for a table with 1 or 2 column families.
+Remember that a region is a contiguous segment of a column family.</p>
+</li>
+<li>
+<p>Keep your column family names as short as possible. The column family names are
+stored for every value (ignoring prefix encoding). They should not be self-documenting
+and descriptive like in a typical RDBMS.</p>
+</li>
+<li>
+<p>If you are storing time-based machine data or logging information, and the row key
+is based on device ID or service ID plus time, you can end up with a pattern where
+older data regions never have additional writes beyond a certain age. In this type
+of situation, you end up with a small number of active regions and a large number
+of older regions which have no new writes. For these situations, you can tolerate
+a larger number of regions because your resource consumption is driven by the active
+regions only.</p>
+</li>
+<li>
+<p>If only one column family is busy with writes, only that column family accomulates
+memory. Be aware of write patterns when allocating resources.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<h1 id="regionserver_sizing_rules_of_thumb" class="sect0"><a class="anchor" href="#regionserver_sizing_rules_of_thumb"></a>RegionServer Sizing Rules of Thumb</h1>
+<div class="openblock partintro">
+<div class="content">
+<div class="paragraph">
+<p>Lars Hofhansl wrote a great
+<a href="http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html">blog post</a>
+about RegionServer memory sizing. The upshot is that you probably need more memory
+than you think you need. He goes into the impact of region size, memstore size, HDFS
+replication factor, and other things to check.</p>
+</div>
+<div class="quoteblock">
+<blockquote>
+<div class="paragraph">
+<p>Personally I would place the maximum disk space per machine that can be served
+exclusively with HBase around 6T, unless you have a very read-heavy workload.
+In that case the Java heap should be 32GB (20G regions, 128M memstores, the rest
+defaults).</p>
+</div>
+</blockquote>
+<div class="attribution">
+&#8212; Lars Hofhansl<br>
+<cite>http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html</cite>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="number.of.cfs"><a class="anchor" href="#number.of.cfs"></a>34. On the number of column families</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low.
@@ -7808,7 +7887,7 @@ Only introduce a second and third column family in the case where data access is
 you query one column family or the other but usually not both at the one time.</p>
 </div>
 <div class="sect2">
-<h3 id="number.of.cfs.card"><a class="anchor" href="#number.of.cfs.card"></a>33.1. Cardinality of ColumnFamilies</h3>
+<h3 id="number.of.cfs.card"><a class="anchor" href="#number.of.cfs.card"></a>34.1. Cardinality of ColumnFamilies</h3>
 <div class="paragraph">
 <p>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA&#8217;s data will likely be spread across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.</p>
 </div>
@@ -7816,10 +7895,10 @@ you query one column family or the other but usually not both at the one time.</
 </div>
 </div>
 <div class="sect1">
-<h2 id="rowkey.design"><a class="anchor" href="#rowkey.design"></a>34. Rowkey Design</h2>
+<h2 id="rowkey.design"><a class="anchor" href="#rowkey.design"></a>35. Rowkey Design</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="_hotspotting"><a class="anchor" href="#_hotspotting"></a>34.1. Hotspotting</h3>
+<h3 id="_hotspotting"><a class="anchor" href="#_hotspotting"></a>35.1. Hotspotting</h3>
 <div class="paragraph">
 <p>Rows in HBase are sorted lexicographically by row key.
 This design optimizes for scans, allowing you to store related rows, or rows that will be read together, near each other.
@@ -7915,7 +7994,7 @@ This effectively randomizes row keys, but sacrifices row ordering properties.</p
 </div>
 </div>
 <div class="sect2">
-<h3 id="timeseries"><a class="anchor" href="#timeseries"></a>34.2. Monotonically Increasing Row Keys/Timeseries Data</h3>
+<h3 id="timeseries"><a class="anchor" href="#timeseries"></a>35.2. Monotonically Increasing Row Keys/Timeseries Data</h3>
 <div class="paragraph">
 <p>In the HBase chapter of Tom White&#8217;s book <a href="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</a> (O&#8217;Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table&#8217;s regions (and thus, a single node), then moving onto the next region, etc.
 With monotonically increasing row-keys (i.e., using a timestamp), this will happen.
@@ -7934,7 +8013,7 @@ Thus, even with a continual stream of input data with a mix of metric types, the
 </div>
 </div>
 <div class="sect2">
-<h3 id="keysize"><a class="anchor" href="#keysize"></a>34.3. Try to minimize row and column sizes</h3>
+<h3 id="keysize"><a class="anchor" href="#keysize"></a>35.3. Try to minimize row and column sizes</h3>
 <div class="paragraph">
 <p>In HBase, values are always freighted with their coordinates; as a cell value passes through the system, it&#8217;ll be accompanied by its row, column name, and timestamp - always.
 If your rows and column names are large, especially compared to the size of the cell value, then you may run up against some interesting scenarios.
@@ -7951,7 +8030,7 @@ Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they
 <p>See <a href="#keyvalue">keyvalue</a> for more information on HBase stores data internally to see why this is important.</p>
 </div>
 <div class="sect3">
-<h4 id="keysize.cf"><a class="anchor" href="#keysize.cf"></a>34.3.1. Column Families</h4>
+<h4 id="keysize.cf"><a class="anchor" href="#keysize.cf"></a>35.3.1. Column Families</h4>
 <div class="paragraph">
 <p>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).</p>
 </div>
@@ -7960,7 +8039,7 @@ Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they
 </div>
 </div>
 <div class="sect3">
-<h4 id="keysize.attributes"><a class="anchor" href="#keysize.attributes"></a>34.3.2. Attributes</h4>
+<h4 id="keysize.attributes"><a class="anchor" href="#keysize.attributes"></a>35.3.2. Attributes</h4>
 <div class="paragraph">
 <p>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via") to store in HBase.</p>
 </div>
@@ -7969,7 +8048,7 @@ Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they
 </div>
 </div>
 <div class="sect3">
-<h4 id="keysize.row"><a class="anchor" href="#keysize.row"></a>34.3.3. Rowkey Length</h4>
+<h4 id="keysize.row"><a class="anchor" href="#keysize.row"></a>35.3.3. Rowkey Length</h4>
 <div class="paragraph">
 <p>Keep them as short as is reasonable such that they can still be useful for required data access (e.g. Get vs.
 Scan). A short key that is useless for data access is not better than a longer key with better get/scan properties.
@@ -7977,7 +8056,7 @@ Expect tradeoffs when designing rowkeys.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="keysize.patterns"><a class="anchor" href="#keysize.patterns"></a>34.3.4. Byte Patterns</h4>
+<h4 id="keysize.patterns"><a class="anchor" href="#keysize.patterns"></a>35.3.4. Byte Patterns</h4>
 <div class="paragraph">
 <p>A long is 8 bytes.
 You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
@@ -8033,7 +8112,7 @@ This is the main trade-off.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="reverse.timestamp"><a class="anchor" href="#reverse.timestamp"></a>34.4. Reverse Timestamps</h3>
+<h3 id="reverse.timestamp"><a class="anchor" href="#reverse.timestamp"></a>35.4. Reverse Timestamps</h3>
 <div class="admonitionblock note">
 <table>
 <tr>
@@ -8065,14 +8144,14 @@ Since HBase keys are in sorted order, this key sorts before any older row-keys f
 </div>
 </div>
 <div class="sect2">
-<h3 id="rowkey.scope"><a class="anchor" href="#rowkey.scope"></a>34.5. Rowkeys and ColumnFamilies</h3>
+<h3 id="rowkey.scope"><a class="anchor" href="#rowkey.scope"></a>35.5. Rowkeys and ColumnFamilies</h3>
 <div class="paragraph">
 <p>Rowkeys are scoped to ColumnFamilies.
 Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="changing.rowkeys"><a class="anchor" href="#changing.rowkeys"></a>34.6. Immutability of Rowkeys</h3>
+<h3 id="changing.rowkeys"><a class="anchor" href="#changing.rowkeys"></a>35.6. Immutability of Rowkeys</h3>
 <div class="paragraph">
 <p>Rowkeys cannot be changed.
 The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
@@ -8080,7 +8159,7 @@ This is a fairly common question on the HBase dist-list so it pays to get the ro
 </div>
 </div>
 <div class="sect2">
-<h3 id="rowkey.regionsplits"><a class="anchor" href="#rowkey.regionsplits"></a>34.7. Relationship Between RowKeys and Region Splits</h3>
+<h3 id="rowkey.regionsplits"><a class="anchor" href="#rowkey.regionsplits"></a>35.7. Relationship Between RowKeys and Region Splits</h3>
 <div class="paragraph">
 <p>If you pre-split your table, it is <em>critical</em> to understand how your rowkey will be distributed across the region boundaries.
 As an example of why this is important, consider the example of using displayable hex characters as the lead position of the key (e.g., "0000000000000000" to "ffffffffffffffff"). Running those key ranges through <code>Bytes.split</code> (which is the split strategy used when creating regions in <code>Admin.createTable(byte[] startKey, byte[] endKey, numRegions)</code> for 10 regions will generate the following splits&#8230;&#8203;</p>
@@ -8152,10 +8231,10 @@ Know your data.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="schema.versions"><a class="anchor" href="#schema.versions"></a>35. Number of Versions</h2>
+<h2 id="schema.versions"><a class="anchor" href="#schema.versions"></a>36. Number of Versions</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="schema.versions.max"><a class="anchor" href="#schema.versions.max"></a>35.1. Maximum Number of Versions</h3>
+<h3 id="schema.versions.max"><a class="anchor" href="#schema.versions.max"></a>36.1. Maximum Number of Versions</h3>
 <div class="paragraph">
 <p>The maximum number of row versions to store is configured per column family via <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</a>.
 The default for max versions is 1.
@@ -8167,7 +8246,7 @@ The number of max versions may need to be increased or decreased depending on ap
 </div>
 </div>
 <div class="sect2">
-<h3 id="schema.minversions"><a class="anchor" href="#schema.minversions"></a>35.2. Minimum Number of Versions</h3>
+<h3 id="schema.minversions"><a class="anchor" href="#schema.minversions"></a>36.2. Minimum Number of Versions</h3>
 <div class="paragraph">
 <p>Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</a>.
 The default for min versions is 0, which means the feature is disabled.
@@ -8177,7 +8256,7 @@ The minimum number of row versions parameter is used together with the time-to-l
 </div>
 </div>
 <div class="sect1">
-<h2 id="supported.datatypes"><a class="anchor" href="#supported.datatypes"></a>36. Supported Datatypes</h2>
+<h2 id="supported.datatypes"><a class="anchor" href="#supported.datatypes"></a>37. Supported Datatypes</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>HBase supports a "bytes-in/bytes-out" interface via <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</a> and <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</a>, so anything that can be converted to an array of bytes can be stored as a value.
@@ -8189,7 +8268,7 @@ All rows in HBase conform to the <a href="#datamodel">Data Model</a>, and that i
 Take that into consideration when making your design, as well as block size for the ColumnFamily.</p>
 </div>
 <div class="sect2">
-<h3 id="_counters"><a class="anchor" href="#_counters"></a>36.1. Counters</h3>
+<h3 id="_counters"><a class="anchor" href="#_counters"></a>37.1. Counters</h3>
 <div class="paragraph">
 <p>One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</a> in <code>Table</code>.</p>
 </div>
@@ -8200,7 +8279,7 @@ Take that into consideration when making your design, as well as block size for
 </div>
 </div>
 <div class="sect1">
-<h2 id="schema.joins"><a class="anchor" href="#schema.joins"></a>37. Joins</h2>
+<h2 id="schema.joins"><a class="anchor" href="#schema.joins"></a>38. Joins</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>If you have multiple tables, don&#8217;t forget to factor in the potential for <a href="#joins">[joins]</a> into the schema design.</p>
@@ -8208,7 +8287,7 @@ Take that into consideration when making your design, as well as block size for
 </div>
 </div>
 <div class="sect1">
-<h2 id="ttl"><a class="anchor" href="#ttl"></a>38. Time To Live (TTL)</h2>
+<h2 id="ttl"><a class="anchor" href="#ttl"></a>39. Time To Live (TTL)</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
@@ -8243,7 +8322,7 @@ There are two notable differences between cell TTL handling and ColumnFamily TTL
 </div>
 </div>
 <div class="sect1">
-<h2 id="cf.keep.deleted"><a class="anchor" href="#cf.keep.deleted"></a>39. Keeping Deleted Cells</h2>
+<h2 id="cf.keep.deleted"><a class="anchor" href="#cf.keep.deleted"></a>40. Keeping Deleted Cells</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>By default, delete markers extend back to the beginning of time.
@@ -8384,7 +8463,7 @@ So with KEEP_DELETED_CELLS enabled deleted cells would get removed if either you
 </div>
 </div>
 <div class="sect1">
-<h2 id="secondary.indexes"><a class="anchor" href="#secondary.indexes"></a>40. Secondary Indexes and Alternate Query Paths</h2>
+<h2 id="secondary.indexes"><a class="anchor" href="#secondary.indexes"></a>41. Secondary Indexes and Alternate Query Paths</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>This section could also be titled "what if my table rowkey looks like <em>this</em> but I also want to query my table like <em>that</em>." A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are reporting requirements on activity across users for certain time ranges.
@@ -8427,7 +8506,7 @@ However, HBase scales better at larger data volumes, so this is a feature trade-
 <p>Additionally, see the David Butler response in this dist-list thread <a href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</a></p>
 </div>
 <div class="sect2">
-<h3 id="secondary.indexes.filter"><a class="anchor" href="#secondary.indexes.filter"></a>40.1. Filter Query</h3>
+<h3 id="secondary.indexes.filter"><a class="anchor" href="#secondary.indexes.filter"></a>41.1. Filter Query</h3>
 <div class="paragraph">
 <p>Depending on the case, it may be appropriate to use <a href="#client.filter">Client Request Filters</a>.
 In this case, no secondary index is created.
@@ -8435,7 +8514,7 @@ However, don&#8217;t try a full-scan on a large table like this from an applicat
 </div>
 </div>
 <div class="sect2">
-<h3 id="secondary.indexes.periodic"><a class="anchor" href="#secondary.indexes.periodic"></a>40.2. Periodic-Update Secondary Index</h3>
+<h3 id="secondary.indexes.periodic"><a class="anchor" href="#secondary.indexes.periodic"></a>41.2. Periodic-Update Secondary Index</h3>
 <div class="paragraph">
 <p>A secondary index could be created in another table which is periodically updated via a MapReduce job.
 The job could be executed intra-day, but depending on load-strategy it could still potentially be out of sync with the main data table.</p>
@@ -8445,13 +8524,13 @@ The job could be executed intra-day, but depending on load-strategy it could sti
 </div>
 </div>
 <div class="sect2">
-<h3 id="secondary.indexes.dualwrite"><a class="anchor" href="#secondary.indexes.dualwrite"></a>40.3. Dual-Write Secondary Index</h3>
+<h3 id="secondary.indexes.dualwrite"><a class="anchor" href="#secondary.indexes.dualwrite"></a>41.3. Dual-Write Secondary Index</h3>
 <div class="paragraph">
 <p>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table). If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <a href="#secondary.indexes.periodic">secondary.indexes.periodic</a>).</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="secondary.indexes.summary"><a class="anchor" href="#secondary.indexes.summary"></a>40.4. Summary Tables</h3>
+<h3 id="secondary.indexes.summary"><a class="anchor" href="#secondary.indexes.summary"></a>41.4. Summary Tables</h3>
 <div class="paragraph">
 <p>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
 These would be generated with MapReduce jobs into another table.</p>
@@ -8461,7 +8540,7 @@ These would be generated with MapReduce jobs into another table.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="secondary.indexes.coproc"><a class="anchor" href="#secondary.indexes.coproc"></a>40.5. Coprocessor Secondary Index</h3>
+<h3 id="secondary.indexes.coproc"><a class="anchor" href="#secondary.indexes.coproc"></a>41.5. Coprocessor Secondary Index</h3>
 <div class="paragraph">
 <p>Coprocessors act like RDBMS triggers. These were added in 0.92.
 For more information, see <a href="#coprocessors">coprocessors</a></p>
@@ -8470,7 +8549,7 @@ For more information, see <a href="#coprocessors">coprocessors</a></p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_constraints"><a class="anchor" href="#_constraints"></a>41. Constraints</h2>
+<h2 id="_constraints"><a class="anchor" href="#_constraints"></a>42. Constraints</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>HBase currently supports 'constraints' in traditional (SQL) database parlance.
@@ -8485,7 +8564,7 @@ since version 0.94.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="schema.casestudies"><a class="anchor" href="#schema.casestudies"></a>42. Schema Design Case Studies</h2>
+<h2 id="schema.casestudies"><a class="anchor" href="#schema.casestudies"></a>43. Schema Design Case Studies</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The following will describe some typical data ingestion use-cases with HBase, and how the rowkey design and construction can be approached.
@@ -8518,7 +8597,7 @@ Know your data, and know your processing requirements.</p>
 </ul>
 </div>
 <div class="sect2">
-<h3 id="schema.casestudies.log_timeseries"><a class="anchor" href="#schema.casestudies.log_timeseries"></a>42.1. Case Study - Log Data and Timeseries Data</h3>
+<h3 id="schema.casestudies.log_timeseries"><a class="anchor" href="#schema.casestudies.log_timeseries"></a>43.1. Case Study - Log Data and Timeseries Data</h3>
 <div class="paragraph">
 <p>Assume that the following data elements are being collected.</p>
 </div>
@@ -8542,7 +8621,7 @@ Know your data, and know your processing requirements.</p>
 <p>We can store them in an HBase table called LOG_DATA, but what will the rowkey be? From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically?</p>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.log_timeseries.tslead"><a class="anchor" href="#schema.casestudies.log_timeseries.tslead"></a>42.1.1. Timestamp In The Rowkey Lead Position</h4>
+<h4 id="schema.casestudies.log_timeseries.tslead"><a class="anchor" href="#schema.casestudies.log_timeseries.tslead"></a>43.1.1. Timestamp In The Rowkey Lead Position</h4>
 <div class="paragraph">
 <p>The rowkey <code>[timestamp][hostname][log-event]</code> suffers from the monotonically increasing rowkey problem described in <a href="#timeseries">Monotonically Increasing Row Keys/Timeseries Data</a>.</p>
 </div>
@@ -8570,14 +8649,14 @@ Attention must be paid to the number of buckets, because this will require the s
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.log_timeseries.hostlead"><a class="anchor" href="#schema.casestudies.log_timeseries.hostlead"></a>42.1.2. Host In The Rowkey Lead Position</h4>
+<h4 id="schema.casestudies.log_timeseries.hostlead"><a class="anchor" href="#schema.casestudies.log_timeseries.hostlead"></a>43.1.2. Host In The Rowkey Lead Position</h4>
 <div class="paragraph">
 <p>The rowkey <code>[hostname][log-event][timestamp]</code> is a candidate if there is a large-ish number of hosts to spread the writes and reads across the keyspace.
 This approach would be useful if scanning by hostname was a priority.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.log_timeseries.revts"><a class="anchor" href="#schema.casestudies.log_timeseries.revts"></a>42.1.3. Timestamp, or Reverse Timestamp?</h4>
+<h4 id="schema.casestudies.log_timeseries.revts"><a class="anchor" href="#schema.casestudies.log_timeseries.revts"></a>43.1.3. Timestamp, or Reverse Timestamp?</h4>
 <div class="paragraph">
 <p>If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., <code>timestamp = Long.MAX_VALUE – timestamp</code>) will create the property of being able to do a Scan on <code>[hostname][log-event]</code> to obtain the quickly obtain the most recently captured events.</p>
 </div>
@@ -8603,7 +8682,7 @@ See <a href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Sca
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.log_timeseries.varkeys"><a class="anchor" href="#schema.casestudies.log_timeseries.varkeys"></a>42.1.4. Variable Length or Fixed Length Rowkeys?</h4>
+<h4 id="schema.casestudies.log_timeseries.varkeys"><a class="anchor" href="#schema.casestudies.log_timeseries.varkeys"></a>43.1.4. Variable Length or Fixed Length Rowkeys?</h4>
 <div class="paragraph">
 <p>It is critical to remember that rowkeys are stamped on every column in HBase.
 If the hostname is <code>a</code> and the event type is <code>e1</code> then the resulting rowkey would be quite small.
@@ -8674,7 +8753,7 @@ by using an
 </div>
 </div>
 <div class="sect2">
-<h3 id="schema.casestudies.log_steroids"><a class="anchor" href="#schema.casestudies.log_steroids"></a>42.2. Case Study - Log Data and Timeseries Data on Steroids</h3>
+<h3 id="schema.casestudies.log_steroids"><a class="anchor" href="#schema.casestudies.log_steroids"></a>43.2. Case Study - Log Data and Timeseries Data on Steroids</h3>
 <div class="paragraph">
 <p>This effectively is the OpenTSDB approach.
 What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
@@ -8705,7 +8784,7 @@ from HBaseCon2012.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="schema.casestudies.custorder"><a class="anchor" href="#schema.casestudies.custorder"></a>42.3. Case Study - Customer/Order</h3>
+<h3 id="schema.casestudies.custorder"><a class="anchor" href="#schema.casestudies.custorder"></a>43.3. Case Study - Customer/Order</h3>
 <div class="paragraph">
 <p>Assume that HBase is used to store customer and order information.
 There are two core record-types being ingested: a Customer record type, and Order record type.</p>
@@ -8791,7 +8870,7 @@ What is the keyspace of the customer number, and what is the format (e.g., numer
 </ul>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.custorder.tables"><a class="anchor" href="#schema.casestudies.custorder.tables"></a>42.3.1. Single Table? Multiple Tables?</h4>
+<h4 id="schema.casestudies.custorder.tables"><a class="anchor" href="#schema.casestudies.custorder.tables"></a>43.3.1. Single Table? Multiple Tables?</h4>
 <div class="paragraph">
 <p>A traditional design approach would have separate tables for CUSTOMER and SALES.
 Another option is to pack multiple record types into a single table (e.g., CUSTOMER++).</p>
@@ -8830,7 +8909,7 @@ Another option is to pack multiple record types into a single table (e.g., CUSTO
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.casestudies.custorder.obj"><a class="anchor" href="#schema.casestudies.custorder.obj"></a>42.3.2. Order Object Design</h4>
+<h4 id="schema.casestudies.custorder.obj"><a class="anchor" href="#schema.casestudies.custorder.obj"></a>43.3.2. Order Object Design</h4>
 <div class="paragraph">
 <p>Now we need to address how to model the Order object.
 Assume that the class structure is as follows:</p>
@@ -9036,13 +9115,13 @@ Care should be taken with this approach to ensure backward compatibility in case
 </div>
 </div>
 <div class="sect2">
-<h3 id="schema.smackdown"><a class="anchor" href="#schema.smackdown"></a>42.4. Case Study - "Tall/Wide/Middle" Schema Design Smackdown</h3>
+<h3 id="schema.smackdown"><a class="anchor" href="#schema.smackdown"></a>43.4. Case Study - "Tall/Wide/Middle" Schema Design Smackdown</h3>
 <div class="paragraph">
 <p>This section will describe additional schema design questions that appear on the dist-list, specifically about tall and wide tables.
 These are general guidelines and not laws - each application must consider its own needs.</p>
 </div>
 <div class="sect3">
-<h4 id="schema.smackdown.rowsversions"><a class="anchor" href="#schema.smackdown.rowsversions"></a>42.4.1. Rows vs. Versions</h4>
+<h4 id="schema.smackdown.rowsversions"><a class="anchor" href="#schema.smackdown.rowsversions"></a>43.4.1. Rows vs. Versions</h4>
 <div class="paragraph">
 <p>A common question is whether one should prefer rows or HBase&#8217;s built-in-versioning.
 The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwrite with each successive update.</p>
@@ -9052,7 +9131,7 @@ The context is typically where there are "a lot" of versions of a row to be reta
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.smackdown.rowscols"><a class="anchor" href="#schema.smackdown.rowscols"></a>42.4.2. Rows vs. Columns</h4>
+<h4 id="schema.smackdown.rowscols"><a class="anchor" href="#schema.smackdown.rowscols"></a>43.4.2. Rows vs. Columns</h4>
 <div class="paragraph">
 <p>Another common question is whether one should prefer rows or columns.
 The context is typically in extreme cases of wide tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.</p>
@@ -9063,7 +9142,7 @@ But there is also a middle path between these two options, and that is "Rows as
 </div>
 </div>
 <div class="sect3">
-<h4 id="schema.smackdown.rowsascols"><a class="anchor" href="#schema.smackdown.rowsascols"></a>42.4.3. Rows as Columns</h4>
+<h4 id="schema.smackdown.rowsascols"><a class="anchor" href="#schema.smackdown.rowsascols"></a>43.4.3. Rows as Columns</h4>
 <div class="paragraph">
 <p>The middle path between Rows vs.
 Columns is packing data that would be a separate row into columns, for certain rows.
@@ -9074,7 +9153,7 @@ For an overview of this approach, see <a href="#schema.casestudies.log_steroids"
 </div>
 </div>
 <div class="sect2">
-<h3 id="casestudies.schema.listdata"><a class="anchor" href="#casestudies.schema.listdata"></a>42.5. Case Study - List Data</h3>
+<h3 id="casestudies.schema.listdata"><a class="anchor" href="#casestudies.schema.listdata"></a>43.5. Case Study - List Data</h3>
 <div class="paragraph">
 <p>The following is an exchange from the user dist-list regarding a fairly common question: how to handle per-user list data in Apache HBase.</p>
 </div>
@@ -9189,7 +9268,7 @@ If you don&#8217;t have time to build it both ways and compare, my advice would
 </div>
 </div>
 <div class="sect1">
-<h2 id="schema.ops"><a class="anchor" href="#schema.ops"></a>43. Operational and Performance Configuration Options</h2>
+<h2 id="schema.ops"><a class="anchor" href="#schema.ops"></a>44. Operational and Performance Configuration Options</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>See the Performance section <a href="#perf.schema">perf.schema</a> for more information operational and performance schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.</p>
@@ -9234,7 +9313,7 @@ In the notes below, we refer to o.a.h.h.mapreduce but replace with the o.a.h.h.m
 </div>
 </div>
 <div class="sect1">
-<h2 id="hbase.mapreduce.classpath"><a class="anchor" href="#hbase.mapreduce.classpath"></a>44. HBase, MapReduce, and the CLASSPATH</h2>
+<h2 id="hbase.mapreduce.classpath"><a class="anchor" href="#hbase.mapreduce.classpath"></a>45. HBase, MapReduce, and the CLASSPATH</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>By default, MapReduce jobs deployed to a MapReduce cluster do not have access to either the HBase configuration under <code>$HBASE_CONF_DIR</code> or the HBase classes.</p>
@@ -9390,7 +9469,7 @@ $ HADOOP_CLASSPATH=$(hbase classpath) hadoop jar MyJob.jar MyJobMainClass</code>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_mapreduce_scan_caching"><a class="anchor" href="#_mapreduce_scan_caching"></a>45. MapReduce Scan Caching</h2>
+<h2 id="_mapreduce_scan_caching"><a class="anchor" href="#_mapreduce_scan_caching"></a>46. MapReduce Scan Caching</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>TableMapReduceUtil now restores the option to set scanner caching (the number of rows which are cached before returning the result to the client) on the Scan object that is passed in.
@@ -9425,7 +9504,7 @@ If you think of the scan as a shovel, a bigger cache setting is analogous to a b
 </div>
 </div>
 <div class="sect1">
-<h2 id="_bundled_hbase_mapreduce_jobs"><a class="anchor" href="#_bundled_hbase_mapreduce_jobs"></a>46. Bundled HBase MapReduce Jobs</h2>
+<h2 id="_bundled_hbase_mapreduce_jobs"><a class="anchor" href="#_bundled_hbase_mapreduce_jobs"></a>47. Bundled HBase MapReduce Jobs</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The HBase JAR also serves as a Driver for some bundled MapReduce jobs.
@@ -9456,7 +9535,7 @@ To run one of the jobs, model your command after the following example.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_hbase_as_a_mapreduce_job_data_source_and_data_sink"><a class="anchor" href="#_hbase_as_a_mapreduce_job_data_source_and_data_sink"></a>47. HBase as a MapReduce Job Data Source and Data Sink</h2>
+<h2 id="_hbase_as_a_mapreduce_job_data_source_and_data_sink"><a class="anchor" href="#_hbase_as_a_mapreduce_job_data_source_and_data_sink"></a>48. HBase as a MapReduce Job Data Source and Data Sink</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>HBase can be used as a data source, <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html">TableInputFormat</a>, and data sink, <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</a> or <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html">MultiTableOutputFormat</a>, for MapReduce jobs.
@@ -9485,7 +9564,7 @@ Otherwise use the default partitioner.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_writing_hfiles_directly_during_bulk_import"><a class="anchor" href="#_writing_hfiles_directly_during_bulk_import"></a>48. Writing HFiles Directly During Bulk Import</h2>
+<h2 id="_writing_hfiles_directly_during_bulk_import"><a class="anchor" href="#_writing_hfiles_directly_during_bulk_import"></a>49. Writing HFiles Directly During Bulk Import</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>If you are importing into a new table, you can bypass the HBase API and write your content directly to the filesystem, formatted into HBase data files (HFiles). Your import will run faster, perhaps an order of magnitude faster.
@@ -9494,7 +9573,7 @@ For more on how this mechanism works, see <a href="#arch.bulk.load">Bulk Loading
 </div>
 </div>
 <div class="sect1">
-<h2 id="_rowcounter_example"><a class="anchor" href="#_rowcounter_example"></a>49. RowCounter Example</h2>
+<h2 id="_rowcounter_example"><a class="anchor" href="#_rowcounter_example"></a>50. RowCounter Example</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>The included <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</a> MapReduce job uses <code>TableInputFormat</code> and does a count of all rows in the specified table.
@@ -9515,17 +9594,17 @@ If you have classpath errors, see <a href="#hbase.mapreduce.classpath">HBase, Ma
 </div>
 </div>
 <div class="sect1">
-<h2 id="splitter"><a class="anchor" href="#splitter"></a>50. Map-Task Splitting</h2>
+<h2 id="splitter"><a class="anchor" href="#splitter"></a>51. Map-Task Splitting</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="splitter.default"><a class="anchor" href="#splitter.default"></a>50.1. The Default HBase MapReduce Splitter</h3>
+<h3 id="splitter.default"><a class="anchor" href="#splitter.default"></a>51.1. The Default HBase MapReduce Splitter</h3>
 <div class="paragraph">
 <p>When <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html">TableInputFormat</a> is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table.
 Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="splitter.custom"><a class="anchor" href="#splitter.custom"></a>50.2. Custom Splitters</h3>
+<h3 id="splitter.custom"><a class="anchor" href="#splitter.custom"></a>51.2. Custom Splitters</h3>
 <div class="paragraph">
 <p>For those interested in implementing custom splitters, see the method <code>getSplits</code> in <a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html">TableInputFormatBase</a>.
 That is where the logic for map-task assignment resides.</p>
@@ -9534,10 +9613,10 @@ That is where the logic for map-task assignment resides.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="mapreduce.example"><a class="anchor" href="#mapreduce.example"></a>51. HBase MapReduce Examples</h2>
+<h2 id="mapreduce.example"><a class="anchor" href="#mapreduce.example"></a>52. HBase MapReduce Examples</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="mapreduce.example.read"><a class="anchor" href="#mapreduce.example.read"></a>51.1. HBase MapReduce Read Example</h3>
+<h3 id="mapreduce.example.read"><a class="anchor" href="#mapreduce.example.read"></a>52.1. HBase MapReduce Read Example</h3>
 <div class="paragraph">
 <p>The following is an example of using HBase as a MapReduce source in read-only manner.
 Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper.
@@ -9585,7 +9664,7 @@ job.setOutputFormatClass(NullOutputFormat.class);   <span class="comment">// bec
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.readwrite"><a class="anchor" href="#mapreduce.example.readwrite"></a>51.2. HBase MapReduce Read/Write Example</h3>
+<h3 id="mapreduce.example.readwrite"><a class="anchor" href="#mapreduce.example.readwrite"></a>52.2. HBase MapReduce Read/Write Example</h3>
 <div class="paragraph">
 <p>The following is an example of using HBase both as a source and as a sink with MapReduce.
 This example will simply copy data from one table to another.</p>
@@ -9655,13 +9734,13 @@ Note: this is what the CopyTable utility does.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.readwrite.multi"><a class="anchor" href="#mapreduce.example.readwrite.multi"></a>51.3. HBase MapReduce Read/Write Example With Multi-Table Output</h3>
+<h3 id="mapreduce.example.readwrite.multi"><a class="anchor" href="#mapreduce.example.readwrite.multi"></a>52.3. HBase MapReduce Read/Write Example With Multi-Table Output</h3>
 <div class="paragraph">
 <p>TODO: example for <code>MultiTableOutputFormat</code>.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.summary"><a class="anchor" href="#mapreduce.example.summary"></a>51.4. HBase MapReduce Summary to HBase Example</h3>
+<h3 id="mapreduce.example.summary"><a class="anchor" href="#mapreduce.example.summary"></a>52.4. HBase MapReduce Summary to HBase Example</h3>
 <div class="paragraph">
 <p>The following example uses HBase as a MapReduce source and sink with a summarization step.
 This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.</p>
@@ -9741,7 +9820,7 @@ This value is used as the key to emit from the mapper, and an <code>IntWritable<
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.summary.file"><a class="anchor" href="#mapreduce.example.summary.file"></a>51.5. HBase MapReduce Summary to File Example</h3>
+<h3 id="mapreduce.example.summary.file"><a class="anchor" href="#mapreduce.example.summary.file"></a>52.5. HBase MapReduce Summary to File Example</h3>
 <div class="paragraph">
 <p>This very similar to the summary example above, with exception that this is using HBase as a MapReduce source but HDFS as the sink.
 The differences are in the job setup and in the reducer.
@@ -9795,7 +9874,7 @@ As for the Reducer, it is a "generic" Reducer instead of extending TableMapper a
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.summary.noreducer"><a class="anchor" href="#mapreduce.example.summary.noreducer"></a>51.6. HBase MapReduce Summary to HBase Without Reducer</h3>
+<h3 id="mapreduce.example.summary.noreducer"><a class="anchor" href="#mapreduce.example.summary.noreducer"></a>52.6. HBase MapReduce Summary to HBase Without Reducer</h3>
 <div class="paragraph">
 <p>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.</p>
 </div>
@@ -9810,7 +9889,7 @@ However, your mileage may vary depending on the number of rows to be processed a
 </div>
 </div>
 <div class="sect2">
-<h3 id="mapreduce.example.summary.rdbms"><a class="anchor" href="#mapreduce.example.summary.rdbms"></a>51.7. HBase MapReduce Summary to RDBMS</h3>
+<h3 id="mapreduce.example.summary.rdbms"><a class="anchor" href="#mapreduce.example.summary.rdbms"></a>52.7. HBase MapReduce Summary to RDBMS</h3>
 <div class="paragraph">
 <p>Sometimes it is more appropriate to generate summaries to an RDBMS.
 For these cases, it is possible to generate summaries directly to an RDBMS via a custom reducer.
@@ -9851,7 +9930,7 @@ Recognize that the more reducers that are assigned to the job, the more simultan
 </div>
 </div>
 <div class="sect1">
-<h2 id="mapreduce.htable.access"><a class="anchor" href="#mapreduce.htable.access"></a>52. Accessing Other HBase Tables in a MapReduce Job</h2>
+<h2 id="mapreduce.htable.access"><a class="anchor" href="#mapreduce.htable.access"></a>53. Accessing Other HBase Tables in a MapReduce Job</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Although the framework currently allows one HBase table as input to a MapReduce job, other HBase tables can be accessed as lookup tables, etc., in a MapReduce job via creating an Table instance in the setup method of the Mapper.</p>
@@ -9876,7 +9955,7 @@ Recognize that the more reducers that are assigned to the job, the more simultan
 </div>
 </div>
 <div class="sect1">
-<h2 id="mapreduce.specex"><a class="anchor" href="#mapreduce.specex"></a>53. Speculative Execution</h2>
+<h2 id="mapreduce.specex"><a class="anchor" href="#mapreduce.specex"></a>54. Speculative Execution</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source.
@@ -9889,7 +9968,7 @@ Especially for longer running jobs, speculative execution will create duplicate
 </div>
 </div>
 <div class="sect1">
-<h2 id="cascading"><a class="anchor" href="#cascading"></a>54. Cascading</h2>
+<h2 id="cascading"><a class="anchor" href="#cascading"></a>55. Cascading</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p><a href="http://www.cascading.org/">Cascading</a> is an alternative API for MapReduce, which
@@ -9978,7 +10057,7 @@ To protect existing HBase installations from exploitation, please <strong>do not
 </div>
 </div>
 <div class="sect1">
-<h2 id="_using_secure_http_https_for_the_web_ui"><a class="anchor" href="#_using_secure_http_https_for_the_web_ui"></a>55. Using Secure HTTP (HTTPS) for the Web UI</h2>
+<h2 id="_using_secure_http_https_for_the_web_ui"><a class="anchor" href="#_using_secure_http_https_for_the_web_ui"></a>56. Using Secure HTTP (HTTPS) for the Web UI</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>A default HBase install uses insecure HTTP connections for Web UIs for the master and region servers.
@@ -10031,7 +10110,7 @@ If you know how to fix this without opening a second port for HTTPS, patches are
 </div>
 </div>
 <div class="sect1">
-<h2 id="hbase.secure.configuration"><a class="anchor" href="#hbase.secure.configuration"></a>56. Secure Client Access to Apache HBase</h2>
+<h2 id="hbase.secure.configuration"><a class="anchor" href="#hbase.secure.configuration"></a>57. Secure Client Access to Apache HBase</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Newer releases of Apache HBase (&gt;= 0.92) support optional SASL authentication of clients.
@@ -10041,7 +10120,7 @@ See also Matteo Bertozzi&#8217;s article on <a href="http://www.cloudera.com/blo
 <p>This describes how to set up Apache HBase and clients for connection to secure HBase resources.</p>
 </div>
 <div class="sect2">
-<h3 id="security.prerequisites"><a class="anchor" href="#security.prerequisites"></a>56.1. Prerequisites</h3>
+<h3 id="security.prerequisites"><a class="anchor" href="#security.prerequisites"></a>57.1. Prerequisites</h3>
 <div class="dlist">
 <dl>
 <dt class="hdlist1">Hadoop Authentication Configuration</dt>
@@ -10058,7 +10137,7 @@ Otherwise, you would be using strong authentication for HBase but not for the un
 </div>
 </div>
 <div class="sect2">
-<h3 id="_server_side_configuration_for_secure_operation"><a class="anchor" href="#_server_side_configuration_for_secure_operation"></a>56.2. Server-side Configuration for Secure Operation</h3>
+<h3 id="_server_side_configuration_for_secure_operation"><a class="anchor" href="#_server_side_configuration_for_secure_operation"></a>57.2. Server-side Configuration for Secure Operation</h3>
 <div class="paragraph">
 <p>First, refer to <a href="#security.prerequisites">security.prerequisites</a> and ensure that your underlying HDFS configuration is secure.</p>
 </div>
@@ -10086,7 +10165,7 @@ Otherwise, you would be using strong authentication for HBase but not for the un
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_secure_operation"><a class="anchor" href="#_client_side_configuration_for_secure_operation"></a>56.3. Client-side Configuration for Secure Operation</h3>
+<h3 id="_client_side_configuration_for_secure_operation"><a class="anchor" href="#_client_side_configuration_for_secure_operation"></a>57.3. Client-side Configuration for Secure Operation</h3>
 <div class="paragraph">
 <p>First, refer to <a href="#security.prerequisites">Prerequisites</a> and ensure that your underlying HDFS configuration is secure.</p>
 </div>
@@ -10140,7 +10219,7 @@ conf.set(<span class="string"><span class="delimiter">&quot;</span><span class="
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.client.thrift"><a class="anchor" href="#security.client.thrift"></a>56.4. Client-side Configuration for Secure Operation - Thrift Gateway</h3>
+<h3 id="security.client.thrift"><a class="anchor" href="#security.client.thrift"></a>57.4. Client-side Configuration for Secure Operation - Thrift Gateway</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file for every Thrift gateway:</p>
 </div>
@@ -10190,7 +10269,7 @@ All client access via the Thrift gateway will use the Thrift gateway&#8217;s cre
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.gateway.thrift"><a class="anchor" href="#security.gateway.thrift"></a>56.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client</h3>
+<h3 id="security.gateway.thrift"><a class="anchor" href="#security.gateway.thrift"></a>57.5. Configure the Thrift Gateway to Authenticate on Behalf of the Client</h3>
 <div class="paragraph">
 <p><a href="#security.client.thrift">Client-side Configuration for Secure Operation - Thrift Gateway</a> describes how to authenticate a Thrift client to HBase using a fixed user.
 As an alternative, you can configure the Thrift gateway to authenticate to HBase on the client&#8217;s behalf, and to access HBase using a proxy user.
@@ -10248,7 +10327,7 @@ To start Thrift on a node, run the command <code>bin/hbase-daemon.sh start thrif
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.gateway.thrift.doas"><a class="anchor" href="#security.gateway.thrift.doas"></a>56.6. Configure the Thrift Gateway to Use the <code>doAs</code> Feature</h3>
+<h3 id="security.gateway.thrift.doas"><a class="anchor" href="#security.gateway.thrift.doas"></a>57.6. Configure the Thrift Gateway to Use the <code>doAs</code> Feature</h3>
 <div class="paragraph">
 <p><a href="#security.gateway.thrift">Configure the Thrift Gateway to Authenticate on Behalf of the Client</a> describes how to configure the Thrift gateway to authenticate to HBase on the client&#8217;s behalf, and to access HBase using a proxy user. The limitation of this approach is that after the client is initialized with a particular set of credentials, it cannot change these credentials during the session. The <code>doAs</code> feature provides a flexible way to impersonate multiple principals using the same client. This feature was implemented in <a href="https://issues.apache.org/jira/browse/HBASE-12640">HBASE-12640</a> for Thrift 1, but is currently not available for Thrift 2.</p>
 </div>
@@ -10293,7 +10372,7 @@ to get an overall idea of how to use this feature in your client.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_secure_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_secure_operation_rest_gateway"></a>56.7. Client-side Configuration for Secure Operation - REST Gateway</h3>
+<h3 id="_client_side_configuration_for_secure_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_secure_operation_rest_gateway"></a>57.7. Client-side Configuration for Secure Operation - REST Gateway</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file for every REST gateway:</p>
 </div>
@@ -10370,7 +10449,7 @@ For more information, refer to <a href="http://hadoop.apache.org/docs/stable/had
 </div>
 </div>
 <div class="sect2">
-<h3 id="security.rest.gateway"><a class="anchor" href="#security.rest.gateway"></a>56.8. REST Gateway Impersonation Configuration</h3>
+<h3 id="security.rest.gateway"><a class="anchor" href="#security.rest.gateway"></a>57.8. REST Gateway Impersonation Configuration</h3>
 <div class="paragraph">
 <p>By default, the REST gateway doesn&#8217;t support impersonation.
 It accesses the HBase on behalf of clients as the user configured as in the previous section.
@@ -10432,7 +10511,7 @@ So it can apply proper authorizations.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="hbase.secure.simpleconfiguration"><a class="anchor" href="#hbase.secure.simpleconfiguration"></a>57. Simple User Access to Apache HBase</h2>
+<h2 id="hbase.secure.simpleconfiguration"><a class="anchor" href="#hbase.secure.simpleconfiguration"></a>58. Simple User Access to Apache HBase</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Newer releases of Apache HBase (&gt;= 0.92) support optional SASL authentication of clients.
@@ -10442,7 +10521,7 @@ See also Matteo Bertozzi&#8217;s article on <a href="http://www.cloudera.com/blo
 <p>This describes how to set up Apache HBase and clients for simple user access to HBase resources.</p>
 </div>
 <div class="sect2">
-<h3 id="_simple_versus_secure_access"><a class="anchor" href="#_simple_versus_secure_access"></a>57.1. Simple versus Secure Access</h3>
+<h3 id="_simple_versus_secure_access"><a class="anchor" href="#_simple_versus_secure_access"></a>58.1. Simple versus Secure Access</h3>
 <div class="paragraph">
 <p>The following section shows how to set up simple user access.
 Simple user access is not a secure method of operating HBase.
@@ -10456,13 +10535,13 @@ Refer to the section <a href="#hbase.secure.configuration">Secure Client Access
 </div>
 </div>
 <div class="sect2">
-<h3 id="_prerequisites"><a class="anchor" href="#_prerequisites"></a>57.2. Prerequisites</h3>
+<h3 id="_prerequisites"><a class="anchor" href="#_prerequisites"></a>58.2. Prerequisites</h3>
 <div class="paragraph">
 <p>None</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_server_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_server_side_configuration_for_simple_user_access_operation"></a>57.3. Server-side Configuration for Simple User Access Operation</h3>
+<h3 id="_server_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_server_side_configuration_for_simple_user_access_operation"></a>58.3. Server-side Configuration for Simple User Access Operation</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file on every server machine in the cluster:</p>
 </div>
@@ -10514,7 +10593,7 @@ Refer to the section <a href="#hbase.secure.configuration">Secure Client Access
 </div>
 </div>
 <div class="sect2">
-<h3 id="_client_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation"></a>57.4. Client-side Configuration for Simple User Access Operation</h3>
+<h3 id="_client_side_configuration_for_simple_user_access_operation"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation"></a>58.4. Client-side Configuration for Simple User Access Operation</h3>
 <div class="paragraph">
 <p>Add the following to the <code>hbase-site.xml</code> file on every client:</p>
 </div>
@@ -10541,7 +10620,7 @@ Refer to the section <a href="#hbase.secure.configuration">Secure Client Access
 <p>Be advised that if the <code>hbase.security.authentication</code> in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.</p>
 </div>
 <div class="sect3">
-<h4 id="_client_side_configuration_for_simple_user_access_operation_thrift_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_thrift_gateway"></a>57.4.1. Client-side Configuration for Simple User Access Operation - Thrift Gateway</h4>
+<h4 id="_client_side_configuration_for_simple_user_access_operation_thrift_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_thrift_gateway"></a>58.4.1. Client-side Configuration for Simple User Access Operation - Thrift Gateway</h4>
 <div class="paragraph">
 <p>The Thrift gateway user will need access.
 For example, to give the Thrift API user, <code>thrift_server</code>, administrative access, a command such as this one will suffice:</p>
@@ -10561,7 +10640,7 @@ All client access via the Thrift gateway will use the Thrift gateway&#8217;s cre
 </div>
 </div>
 <div class="sect3">
-<h4 id="_client_side_configuration_for_simple_user_access_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_rest_gateway"></a>57.4.2. Client-side Configuration for Simple User Access Operation - REST Gateway</h4>
+<h4 id="_client_side_configuration_for_simple_user_access_operation_rest_gateway"><a class="anchor" href="#_client_side_configuration_for_simple_user_access_operation_rest_gateway"></a>58.4.2. Client-side Configuration for Simple User Access Operation - REST Gateway</h4>
 <div class="paragraph">
 <p>The REST gateway will authenticate with HBase using the supplied credential.
 No authentication will be performed by the REST gateway itself.
@@ -10588,13 +10667,13 @@ This is future work.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_securing_access_to_hdfs_and_zookeeper"><a class="anchor" href="#_securing_access_to_hdfs_and_zookeeper"></a>58. Securing Access to HDFS and ZooKeeper</h2>
+<h2 id="_securing_access_to_hdfs_and_zookeeper"><a class="anchor" href="#_securing_access_to_hdfs_and_zookeeper"></a>59. Securing Access to HDFS and ZooKeeper</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>Secure HBase requires secure ZooKeeper and HDFS so that users cannot access and/or modify the metadata and data from under HBase. HBase uses HDFS (or configured file system) to keep its data files as well as write ahead logs (WALs) and other data. HBase uses ZooKeeper to store some metadata for operations (master address, table locks, recovery state, etc).</p>
 </div>
 <div class="sect2">
-<h3 id="_securing_zookeeper_data"><a class="anchor" href="#_securing_zookeeper_data"></a>58.1. Securing ZooKeeper Data</h3>
+<h3 id="_securing_zookeeper_data"><a class="anchor" href="#_securing_zookeeper_data"></a>59.1. Securing ZooKeeper Data</h3>
 <div class="paragraph">
 <p>ZooKeeper has a pluggable authentication mechanism to enable access from clients using different methods. ZooKeeper even allows authenticated and un-authenticated clients at the same time. The access to znodes can be restricted by providing Access Control Lists (ACLs) per znode. An ACL contains two components, the authentication method and the principal. ACLs are NOT enforced hierarchically. See <a href="https://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#sc_ZooKeeperPluggableAuthentication">ZooKeeper Programmers Guide</a> for details.</p>
 </div>
@@ -10603,7 +10682,7 @@ This is future work.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_securing_file_system_hdfs_data"><a class="anchor" href="#_securing_file_system_hdfs_data"></a>58.2. Securing File System (HDFS) Data</h3>
+<h3 id="_securing_file_system_hdfs_data"><a class="anchor" href="#_securing_file_system_hdfs_data"></a>59.2. Securing File System (HDFS) Data</h3>
 <div class="paragraph">
 <p>All of the data under management is kept under the root directory in the file system (<code>hbase.rootdir</code>). Access to the data and WAL files in the filesystem should be restricted so that users cannot bypass the HBase layer, and peek at the underlying data files from the file system. HBase assumes the filesystem used (HDFS or other) enforces permissions hierarchically. If sufficient protection from the file system (both authorization and authentication) is not provided, HBase level authorization control (ACLs, visibility labels, etc) is meaningless since the user can always access the data from the file system.</p>
 </div>
@@ -10625,7 +10704,7 @@ This is future work.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="_securing_access_to_your_data"><a class="anchor" href="#_securing_access_to_your_data"></a>59. Securing Access To Your Data</h2>
+<h2 id="_securing_access_to_your_data"><a class="anchor" href="#_securing_access_to_your_data"></a>60. Securing Access To Your Data</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>After you have configured secure authentication between HBase client and server processes and gateways, you need to consider the security of your data itself.
@@ -10704,7 +10783,7 @@ This is the default for HBase 1.0 and newer.</p>
 </ol>
 </div>
 <div class="sect2">
-<h3 id="hbase.tags"><a class="anchor" href="#hbase.tags"></a>59.1. Tags</h3>
+<h3 id="hbase.tags"><a class="anchor" href="#hbase.tags"></a>60.1. Tags</h3>
 <div class="paragraph">
 <p><em class="firstterm">Tags</em> are a feature of HFile v3.
 A tag is a piece of metadata which is part of a cell, separate from the key, value, and version.
@@ -10714,7 +10793,7 @@ It is possible that in the future, tags will be used to implement other HBase fe
 You don&#8217;t need to know a lot about tags in order to use the security features they enable.</p>
 </div>
 <div class="sect3">
-<h4 id="_implementation_details"><a class="anchor" href="#_implementation_details"></a>59.1.1. Implementation Details</h4>
+<h4 id="_implementation_details"><a class="anchor" href="#_implementation_details"></a>60.1.1. Implementation Details</h4>
 <div class="paragraph">
 <p>Every cell can have zero or more tags.
 Every tag has a type and the actual tag byte array.</p>
@@ -10735,9 +10814,9 @@ Tag compression uses dictionary encoding.</p>
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.accesscontrol.configuration"><a class="anchor" href="#hbase.accesscontrol.configuration"></a>59.2. Access Control Labels (ACLs)</h3>
+<h3 id="hbase.accesscontrol.configuration"><a class="anchor" href="#hbase.accesscontrol.configuration"></a>60.2. Access Control Labels (ACLs)</h3>
 <div class="sect3">
-<h4 id="_how_it_works"><a class="anchor" href="#_how_it_works"></a>59.2.1. How It Works</h4>
+<h4 id="_how_it_works"><a class="anchor" href="#_how_it_works"></a>60.2.1. How It Works</h4>
 <div class="paragraph">
 <p>ACLs in HBase are based upon a user&#8217;s membership in or exclusion from groups, and a given group&#8217;s permissions to access a given resource.
 ACLs are implemented as a coprocessor called AccessController.</p>
@@ -11402,7 +11481,7 @@ hbase&gt; user_permission JAVA_REGEX</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="_visibility_labels"><a class="anchor" href="#_visibility_labels"></a>59.3. Visibility Labels</h3>
+<h3 id="_visibility_labels"><a class="anchor" href="#_visibility_labels"></a>60.3. Visibility Labels</h3>
 <div class="paragraph">
 <p>Visibility labels control can be used to only permit users or principals associated with a given label to read or access cells with that label.
 For instance, you might label a cell <code>top-secret</code>, and only grant access to that label to the <code>managers</code> group.
@@ -11515,7 +11594,7 @@ Visibility labels are not currently applied for superusers.
 </tbody>
 </table>
 <div class="sect3">
-<h4 id="_server_side_configuration_2"><a class="anchor" href="#_server_side_configuration_2"></a>59.3.1. Server-Side Configuration</h4>
+<h4 id="_server_side_configuration_2"><a class="anchor" href="#_server_side_configuration_2"></a>60.3.1. Server-Side Configuration</h4>
 <div class="olist arabic">
 <ol class="arabic">
 <li>
@@ -11565,7 +11644,7 @@ In that case, the mutation will fail if it makes use of labels the user is not a
 </div>
 </div>
 <div class="sect3">
-<h4 id="_administration_2"><a class="anchor" href="#_administration_2"></a>59.3.2. Administration</h4>
+<h4 id="_administration_2"><a class="anchor" href="#_administration_2"></a>60.3.2. Administration</h4>
 <div class="paragraph">
 <p>Administration tasks can be performed using the HBase Shell or the Java API.
 For defining the list of visibility labels and associating labels with users, the HBase Shell is probably simpler.</p>
@@ -11799,7 +11878,7 @@ The correct way to apply cell level labels is to do so in the application code w
 </div>
 </div>
 <div class="sect3">
-<h4 id="reading_cells_with_labels"><a class="anchor" href="#reading_cells_with_labels"></a>59.3.3. Reading Cells with Labels</h4>
+<h4 id="reading_cells_with_labels"><a class="anchor" href="#reading_cells_with_labels"></a>60.3.3. Reading Cells with Labels</h4>
 <div class="paragraph">
 <p>When you issue a Scan or Get, HBase uses your default set of authorizations to
 filter out cells that you do not have access to. A superuser can set the default
@@ -11860,7 +11939,7 @@ public <span class="predefined-type">Void</span> run() <span class="directive">t
 </div>
 </div>
 <div class="sect3">
-<h4 id="_implementing_your_own_visibility_label_algorithm"><a class="anchor" href="#_implementing_your_own_visibility_label_algorithm"></a>59.3.4. Implementing Your Own Visibility Label Algorithm</h4>
+<h4 id="_implementing_your_own_visibility_label_algorithm"><a class="anchor" href="#_implementing_your_own_visibility_label_algorithm"></a>60.3.4. Implementing Your Own Visibility Label Algorithm</h4>
 <div class="paragraph">
 <p>Interpreting the labels authenticated for a given get/scan request is a pluggable algorithm.</p>
 </div>
@@ -11872,7 +11951,7 @@ public <span class="predefined-type">Void</span> run() <span class="directive">t
 </div>
 </div>
 <div class="sect3">
-<h4 id="_replicating_visibility_tags_as_strings"><a class="anchor" href="#_replicating_visibility_tags_as_strings"></a>59.3.5. Replicating Visibility Tags as Strings</h4>
+<h4 id="_replicating_visibility_tags_as_strings"><a class="anchor" href="#_replicating_visibility_tags_as_strings"></a>60.3.5. Replicating Visibility Tags as Strings</h4>
 <div class="paragraph">
 <p>As mentioned in the above sections, the interface <code>VisibilityLabelService</code> could be used to implement a different way of storing the visibility expressions in the cells. Clusters with replication enabled also must replicate the visibility expressions to the peer cluster. If <code>DefaultVisibilityLabelServiceImpl</code> is used as the implementation for <code>VisibilityLabelService</code>, all the visibility expression are converted to the corresponding expression based on the ordinals for each visibility label stored in the labels table. During replication, visible cells are also replicated with the ordinal-based expression intact. The peer cluster may not have the same <code>labels</code> table with the same ordinal mapping for the visibility labels. In that case, replicating the ordinals makes no sense. It would be better if the replication occurred with the visibility expressions transmitted as strings. To replicate the visibility expression as strings to the peer 
 cluster, create a <code>RegionServerObserver</code> configuration which works based on the implementation of the <code>VisibilityLabelService</code> interface. The configuration below enables replication of visibility expressions to peer clusters as strings. See <a href="https://issues.apache.org/jira/browse/HBASE-11639">HBASE-11639</a> for more details.</p>
 </div>
@@ -11887,7 +11966,7 @@ public <span class="predefined-type">Void</span> run() <span class="directive">t
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.encryption.server"><a class="anchor" href="#hbase.encryption.server"></a>59.4. Transparent Encryption of Data At Rest</h3>
+<h3 id="hbase.encryption.server"><a class="anchor" href="#hbase.encryption.server"></a>60.4. Transparent Encryption of Data At Rest</h3>
 <div class="paragraph">
 <p>HBase provides a mechanism for protecting your data at rest, in HFiles and the WAL, which reside within HDFS or another distributed filesystem.
 A two-tier architecture is used for flexible and non-intrusive key rotation.
@@ -11896,7 +11975,7 @@ When data is written, it is encrypted.
 When it is read, it is decrypted on demand.</p>
 </div>
 <div class="sect3">
-<h4 id="_how_it_works_2"><a class="anchor" href="#_how_it_works_2"></a>59.4.1. How It Works</h4>
+<h4 id="_how_it_works_2"><a class="anchor" href="#_how_it_works_2"></a>60.4.1. How It Works</h4>
 <div class="paragraph">
 <p>The administrator provisions a master key for the cluster, which is stored in a key provider accessible to every trusted HBase process, including the HMaster, RegionServers, and clients (such as HBase Shell) on administrative workstations.
 The default key provider is integrated with the Java KeyStore API and any key management systems with support for it.
@@ -11927,7 +12006,7 @@ When WAL encryption is enabled, all WALs are encrypted, regardless of whether th
 </div>
 </div>
 <div class="sect3">
-<h4 id="_server_side_configuration_3"><a class="anchor" href="#_server_side_configuration_3"></a>59.4.2. Server-Side Configuration</h4>
+<h4 id="_server_side_configuration_3"><a class="anchor" href="#_server_side_configuration_3"></a>60.4.2. Server-Side Configuration</h4>
 <div class="paragraph">
 <p>This procedure assumes you are using the default Java keystore implementation.
 If you are using a custom implementation, check its documentation and adjust accordingly.</p>
@@ -12082,7 +12161,7 @@ You can include these in the HMaster&#8217;s <em>hbase-site.xml</em> as well, bu
 </div>
 </div>
 <div class="sect3">
-<h4 id="_administration_3"><a class="anchor" href="#_administration_3"></a>59.4.3. Administration</h4>
+<h4 id="_administration_3"><a class="anchor" href="#_administration_3"></a>60.4.3. Administration</h4>
 <div class="paragraph">
 <p>Administrative tasks can be performed in HBase Shell or the Java API.</p>
 </div>
@@ -12136,7 +12215,7 @@ Next, configure fallback to the old master key in the <em>hbase-site.xml</em> fi
 </div>
 </div>
 <div class="sect2">
-<h3 id="hbase.secure.bulkload"><a class="anchor" href="#hbase.secure.bulkload"></a>59.5. Secure Bulk Load</h3>
+<h3 id="hbase.secure.bulkload"><a class="anchor" href="#hbase.secure.bulkload"></a>60.5. Secure Bulk Load</h3>
 <div class="paragraph">
 <p>Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase.
 Secure bulk loading is implemented by a coprocessor, named
@@ -12193,7 +12272,7 @@ HBase manages creation and deletion of this directory.</p>
 </div>
 </div>
 <div class="sect1">
-<h2 id="security.example.config"><a class="anchor" href="#security.example.config"></a>60. Security Configuration Example</h2>
+<h2 id="security.example.config"><a class="anchor" href="#security.example.config"></a>61. Security Configuration Example</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>This configuration example includes support for HFile v3, ACLs, Visibility Labels, and transparent encryption of data at rest and the WAL.
@@ -12343,10 +12422,10 @@ All options have been discussed separately in the sections above.</p>
 </div>
 <h1 id="_architecture" class="sect0"><a class="anchor" href="#_architecture"></a>Architecture</h1>
 <div class="sect1">
-<h2 id="arch.overview"><a class="anchor" href="#arch.overview"></a>61. Overview</h2>
+<h2 id="arch.overview"><a class="anchor" href="#arch.overview"></a>62. Overview</h2>
 <div class="sectionbody">
 <div class="sect2">
-<h3 id="arch.overview.nosql"><a class="anchor" href="#arch.overview.nosql"></a>61.1. NoSQL?</h3>
+<h3 id="arch.overview.nosql"><a class="anchor" href="#arch.overview.nosql"></a>62.1. NoSQL?</h3>
 <div class="paragraph">
 <p>HBase is a type of "NoSQL" database.
 "NoSQL" is a general term meaning that the database isn&#8217;t an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database.
@@ -12394,7 +12473,7 @@ This makes it very suitable for tasks such as high-speed counter aggregation.</p
 </div>
 </div>
 <div class="sect

<TRUNCATED>