You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by mi...@apache.org on 2015/10/20 22:54:24 UTC

svn commit: r1709680 [2/5] - in /hbase/hbase.apache.org/trunk: ./ devapidocs/org/apache/hadoop/hbase/tmpl/master/ devapidocs/org/apache/hadoop/hbase/tmpl/regionserver/ hbase-annotations/ hbase-spark/ xref-test/org/apache/hadoop/hbase/client/ xref/org/a...

Added: hbase/hbase.apache.org/trunk/book.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/book.html?rev=1709680&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/book.html (added)
+++ hbase/hbase.apache.org/trunk/book.html Tue Oct 20 20:54:21 2015
@@ -0,0 +1,31451 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 1.5.2">
+<meta name="author" content="Apache HBase Team">
+<title>Apache HBase &#8482; Reference Guide</title>
+<link rel="stylesheet" href="./hbase.css">
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css">
+<link rel="stylesheet" href="./coderay-asciidoctor.css">
+</head>
+<body class="book toc2 toc-left">
+<div id="header">
+<h1>Apache HBase &#8482; Reference Guide</h1>
+<div class="details">
+<span id="author" class="author">Apache HBase Team</span><br>
+<span id="email" class="email">&lt;<a href="mailto:hbase-dev@lists.apache.org">hbase-dev@lists.apache.org</a>&gt;</span><br>
+<span id="revnumber">version 2.0.0-SNAPSHOT</span>
+</div>
+<div id="toc" class="toc2">
+<div id="toctitle">Contents</div>
+<ul class="sectlevel1">
+<li><a href="#_preface">Preface</a></li>
+<li><a href="#_getting_started">Getting Started</a>
+<ul class="sectlevel1">
+<li><a href="#_introduction">1. Introduction</a></li>
+<li><a href="#quickstart">2. Quick Start - Standalone HBase</a></li>
+</ul>
+</li>
+<li><a href="#configuration">Apache HBase Configuration</a>
+<ul class="sectlevel1">
+<li><a href="#_configuration_files">3. Configuration Files</a></li>
+<li><a href="#basic.prerequisites">4. Basic Prerequisites</a></li>
+<li><a href="#standalone_dist">5. HBase run modes: Standalone and Distributed</a></li>
+<li><a href="#confirm">6. Running and Confirming Your Installation</a></li>
+<li><a href="#config.files">7. Default Configuration</a></li>
+<li><a href="#example_config">8. Example Configurations</a></li>
+<li><a href="#important_configurations">9. The Important Configurations</a></li>
+<li><a href="#dyn_config">10. Dynamic Configuration</a></li>
+</ul>
+</li>
+<li><a href="#upgrading">Upgrading</a>
+<ul class="sectlevel1">
+<li><a href="#hbase.versioning">11. HBase version number and compatibility</a></li>
+<li><a href="#_upgrade_paths">12. Upgrade Paths</a></li>
+</ul>
+</li>
+<li><a href="#shell">The Apache HBase Shell</a>
+<ul class="sectlevel1">
+<li><a href="#scripting">13. Scripting with Ruby</a></li>
+<li><a href="#_running_the_shell_in_non_interactive_mode">14. Running the Shell in Non-Interactive Mode</a></li>
+<li><a href="#hbase.shell.noninteractive">15. HBase Shell in OS Scripts</a></li>
+<li><a href="#_read_hbase_shell_commands_from_a_command_file">16. Read HBase Shell Commands from a Command File</a></li>
+<li><a href="#_passing_vm_options_to_the_shell">17. Passing VM Options to the Shell</a></li>
+<li><a href="#_shell_tricks">18. Shell Tricks</a></li>
+</ul>
+</li>
+<li><a href="#datamodel">Data Model</a>
+<ul class="sectlevel1">
+<li><a href="#conceptual.view">19. Conceptual View</a></li>
+<li><a href="#physical.view">20. Physical View</a></li>
+<li><a href="#_namespace">21. Namespace</a></li>
+<li><a href="#_table">22. Table</a></li>
+<li><a href="#_row">23. Row</a></li>
+<li><a href="#columnfamily">24. Column Family</a></li>
+<li><a href="#_cells">25. Cells</a></li>
+<li><a href="#_data_model_operations">26. Data Model Operations</a></li>
+<li><a href="#versions">27. Versions</a></li>
+<li><a href="#dm.sort">28. Sort Order</a></li>
+<li><a href="#dm.column.metadata">29. Column Metadata</a></li>
+<li><a href="#_joins">30. Joins</a></li>
+<li><a href="#_acid">31. ACID</a></li>
+</ul>
+</li>
+<li><a href="#schema">HBase and Schema Design</a>
+<ul class="sectlevel1">
+<li><a href="#schema.creation">32. Schema Creation</a></li>
+<li><a href="#number.of.cfs">33. On the number of column families</a></li>
+<li><a href="#rowkey.design">34. Rowkey Design</a></li>
+<li><a href="#schema.versions">35. Number of Versions</a></li>
+<li><a href="#supported.datatypes">36. Supported Datatypes</a></li>
+<li><a href="#schema.joins">37. Joins</a></li>
+<li><a href="#ttl">38. Time To Live (TTL)</a></li>
+<li><a href="#cf.keep.deleted">39. Keeping Deleted Cells</a></li>
+<li><a href="#secondary.indexes">40. Secondary Indexes and Alternate Query Paths</a></li>
+<li><a href="#_constraints">41. Constraints</a></li>
+<li><a href="#schema.casestudies">42. Schema Design Case Studies</a></li>
+<li><a href="#schema.ops">43. Operational and Performance Configuration Options</a></li>
+</ul>
+</li>
+<li><a href="#mapreduce">HBase and MapReduce</a>
+<ul class="sectlevel1">
+<li><a href="#hbase.mapreduce.classpath">44. HBase, MapReduce, and the CLASSPATH</a></li>
+<li><a href="#_mapreduce_scan_caching">45. MapReduce Scan Caching</a></li>
+<li><a href="#_bundled_hbase_mapreduce_jobs">46. Bundled HBase MapReduce Jobs</a></li>
+<li><a href="#_hbase_as_a_mapreduce_job_data_source_and_data_sink">47. HBase as a MapReduce Job Data Source and Data Sink</a></li>
+<li><a href="#_writing_hfiles_directly_during_bulk_import">48. Writing HFiles Directly During Bulk Import</a></li>
+<li><a href="#_rowcounter_example">49. RowCounter Example</a></li>
+<li><a href="#splitter">50. Map-Task Splitting</a></li>
+<li><a href="#mapreduce.example">51. HBase MapReduce Examples</a></li>
+<li><a href="#mapreduce.htable.access">52. Accessing Other HBase Tables in a MapReduce Job</a></li>
+<li><a href="#mapreduce.specex">53. Speculative Execution</a></li>
+</ul>
+</li>
+<li><a href="#security">Securing Apache HBase</a>
+<ul class="sectlevel1">
+<li><a href="#_using_secure_http_https_for_the_web_ui">54. Using Secure HTTP (HTTPS) for the Web UI</a></li>
+<li><a href="#hbase.secure.configuration">55. Secure Client Access to Apache HBase</a></li>
+<li><a href="#hbase.secure.simpleconfiguration">56. Simple User Access to Apache HBase</a></li>
+<li><a href="#_securing_access_to_hdfs_and_zookeeper">57. Securing Access to HDFS and ZooKeeper</a></li>
+<li><a href="#_securing_access_to_your_data">58. Securing Access To Your Data</a></li>
+<li><a href="#security.example.config">59. Security Configuration Example</a></li>
+</ul>
+</li>
+<li><a href="#_architecture">Architecture</a>
+<ul class="sectlevel1">
+<li><a href="#arch.overview">60. Overview</a></li>
+<li><a href="#arch.catalog">61. Catalog Tables</a></li>
+<li><a href="#architecture.client">62. Client</a></li>
+<li><a href="#client.filter">63. Client Request Filters</a></li>
+<li><a href="#_master">64. Master</a></li>
+<li><a href="#regionserver.arch">65. RegionServer</a></li>
+<li><a href="#regions.arch">66. Regions</a></li>
+<li><a href="#arch.bulk.load">67. Bulk Loading</a></li>
+<li><a href="#arch.hdfs">68. HDFS</a></li>
+<li><a href="#arch.timelineconsistent.reads">69. Timeline-consistent High Available Reads</a></li>
+<li><a href="#hbase_mob">70. Storing Medium-sized Objects (MOB)</a></li>
+</ul>
+</li>
+<li><a href="#hbase_apis">Apache HBase APIs</a>
+<ul class="sectlevel1">
+<li><a href="#_examples">71. Examples</a></li>
+</ul>
+</li>
+<li><a href="#external_apis">Apache HBase External APIs</a>
+<ul class="sectlevel1">
+<li><a href="#nonjava.jvm">72. Non-Java Languages Talking to the JVM</a></li>
+<li><a href="#_rest">73. REST</a></li>
+<li><a href="#_thrift">74. Thrift</a></li>
+<li><a href="#c">75. C/C++ Apache HBase Client</a></li>
+</ul>
+</li>
+<li><a href="#thrift">Thrift API and Filter Language</a>
+<ul class="sectlevel1">
+<li><a href="#thrift.filter_language">76. Filter Language</a></li>
+</ul>
+</li>
+<li><a href="#spark">HBase and Spark</a>
+<ul class="sectlevel1">
+<li><a href="#_basic_spark">77. Basic Spark</a></li>
+<li><a href="#_spark_streaming">78. Spark Streaming</a></li>
+<li><a href="#_bulk_load">79. Bulk Load</a></li>
+<li><a href="#_sparksql_dataframes">80. SparkSQL/DataFrames</a></li>
+</ul>
+</li>
+<li><a href="#cp">Apache HBase Coprocessors</a>
+<ul class="sectlevel1">
+<li><a href="#_coprocessor_framework">81. Coprocessor Framework</a></li>
+<li><a href="#_examples_2">82. Examples</a></li>
+<li><a href="#_building_a_coprocessor">83. Building A Coprocessor</a></li>
+<li><a href="#_check_the_status_of_a_coprocessor">84. Check the Status of a Coprocessor</a></li>
+<li><a href="#_monitor_time_spent_in_coprocessors">85. Monitor Time Spent in Coprocessors</a></li>
+</ul>
+</li>
+<li><a href="#performance">Apache HBase Performance Tuning</a>
+<ul class="sectlevel1">
+<li><a href="#perf.os">86. Operating System</a></li>
+<li><a href="#perf.network">87. Network</a></li>
+<li><a href="#jvm">88. Java</a></li>
+<li><a href="#perf.configurations">89. HBase Configurations</a></li>
+<li><a href="#perf.zookeeper">90. ZooKeeper</a></li>
+<li><a href="#perf.schema">91. Schema Design</a></li>
+<li><a href="#perf.general">92. HBase General Patterns</a></li>
+<li><a href="#perf.writing">93. Writing to HBase</a></li>
+<li><a href="#perf.reading">94. Reading from HBase</a></li>
+<li><a href="#perf.deleting">95. Deleting from HBase</a></li>
+<li><a href="#perf.hdfs">96. HDFS</a></li>
+<li><a href="#perf.ec2">97. Amazon EC2</a></li>
+<li><a href="#perf.hbase.mr.cluster">98. Collocating HBase and MapReduce</a></li>
+<li><a href="#perf.casestudy">99. Case Studies</a></li>
+</ul>
+</li>
+<li><a href="#trouble">Troubleshooting and Debugging Apache HBase</a>
+<ul class="sectlevel1">
+<li><a href="#trouble.general">100. General Guidelines</a></li>
+<li><a href="#trouble.log">101. Logs</a></li>
+<li><a href="#trouble.resources">102. Resources</a></li>
+<li><a href="#trouble.tools">103. Tools</a></li>
+<li><a href="#trouble.client">104. Client</a></li>
+<li><a href="#trouble.mapreduce">105. MapReduce</a></li>
+<li><a href="#trouble.namenode">106. NameNode</a></li>
+<li><a href="#trouble.network">107. Network</a></li>
+<li><a href="#trouble.rs">108. RegionServer</a></li>
+<li><a href="#trouble.master">109. Master</a></li>
+<li><a href="#trouble.zookeeper">110. ZooKeeper</a></li>
+<li><a href="#trouble.ec2">111. Amazon EC2</a></li>
+<li><a href="#trouble.versions">112. HBase and Hadoop version issues</a></li>
+<li><a href="#_ipc_configuration_conflicts_with_hadoop">113. IPC Configuration Conflicts with Hadoop</a></li>
+<li><a href="#_hbase_and_hdfs">114. HBase and HDFS</a></li>
+<li><a href="#trouble.tests">115. Running unit or integration tests</a></li>
+<li><a href="#trouble.casestudy">116. Case Studies</a></li>
+<li><a href="#trouble.crypto">117. Cryptographic Features</a></li>
+<li><a href="#_operating_system_specific_issues">118. Operating System Specific Issues</a></li>
+<li><a href="#_jdk_issues">119. JDK Issues</a></li>
+</ul>
+</li>
+<li><a href="#casestudies">Apache HBase Case Studies</a>
+<ul class="sectlevel1">
+<li><a href="#casestudies.overview">120. Overview</a></li>
+<li><a href="#casestudies.schema">121. Schema Design</a></li>
+<li><a href="#casestudies.perftroub">122. Performance/Troubleshooting</a></li>
+</ul>
+</li>
+<li><a href="#ops_mgt">Apache HBase Operational Management</a>
+<ul class="sectlevel1">
+<li><a href="#tools">123. HBase Tools and Utilities</a></li>
+<li><a href="#ops.regionmgt">124. Region Management</a></li>
+<li><a href="#node.management">125. Node Management</a></li>
+<li><a href="#_hbase_metrics">126. HBase Metrics</a></li>
+<li><a href="#ops.monitoring">127. HBase Monitoring</a></li>
+<li><a href="#_cluster_replication">128. Cluster Replication</a></li>
+<li><a href="#_running_multiple_workloads_on_a_single_cluster">129. Running Multiple Workloads On a Single Cluster</a></li>
+<li><a href="#ops.backup">130. HBase Backup</a></li>
+<li><a href="#ops.snapshots">131. HBase Snapshots</a></li>
+<li><a href="#ops.capacity">132. Capacity Planning and Region Sizing</a></li>
+<li><a href="#table.rename">133. Table Rename</a></li>
+</ul>
+</li>
+<li><a href="#developer">Building and Developing Apache HBase</a>
+<ul class="sectlevel1">
+<li><a href="#getting.involved">134. Getting Involved</a></li>
+<li><a href="#repos">135. Apache HBase Repositories</a></li>
+<li><a href="#_ides">136. IDEs</a></li>
+<li><a href="#build">137. Building Apache HBase</a></li>
+<li><a href="#releasing">138. Releasing Apache HBase</a></li>
+<li><a href="#hbase.rc.voting">139. Voting on Release Candidates</a></li>
+<li><a href="#documentation">140. Generating the HBase Reference Guide</a></li>
+<li><a href="#hbase.org">141. Updating <a href="http://hbase.apache.org">hbase.apache.org</a></a></li>
+<li><a href="#hbase.tests">142. Tests</a></li>
+<li><a href="#developing">143. Developer Guidelines</a></li>
+</ul>
+</li>
+<li><a href="#unit.tests">Unit Testing HBase Applications</a>
+<ul class="sectlevel1">
+<li><a href="#_junit">144. JUnit</a></li>
+<li><a href="#_mockito">145. Mockito</a></li>
+<li><a href="#_mrunit">146. MRUnit</a></li>
+<li><a href="#_integration_testing_with_a_hbase_mini_cluster">147. Integration Testing with a HBase Mini-Cluster</a></li>
+</ul>
+</li>
+<li><a href="#zookeeper">ZooKeeper</a>
+<ul class="sectlevel1">
+<li><a href="#_using_existing_zookeeper_ensemble">148. Using existing ZooKeeper ensemble</a></li>
+<li><a href="#zk.sasl.auth">149. SASL Authentication with ZooKeeper</a></li>
+</ul>
+</li>
+<li><a href="#community">Community</a>
+<ul class="sectlevel1">
+<li><a href="#_decisions">150. Decisions</a></li>
+<li><a href="#community.roles">151. Community Roles</a></li>
+<li><a href="#hbase.commit.msg.format">152. Commit Message format</a></li>
+</ul>
+</li>
+<li><a href="#_appendix">Appendix</a>
+<ul class="sectlevel1">
+<li><a href="#appendix_contributing_to_documentation">Appendix A: Contributing to Documentation</a></li>
+<li><a href="#faq">Appendix B: FAQ</a></li>
+<li><a href="#hbck.in.depth">Appendix C: hbck In Depth</a></li>
+<li><a href="#appendix_acl_matrix">Appendix D: Access Control Matrix</a></li>
+<li><a href="#compression">Appendix E: Compression and Data Block Encoding In HBase</a></li>
+<li><a href="#data.block.encoding.enable">153. Enable Data Block Encoding</a></li>
+<li><a href="#sql">Appendix F: SQL over HBase</a></li>
+<li><a href="#_ycsb">Appendix G: YCSB</a></li>
+<li><a href="#_hfile_format_2">Appendix H: HFile format</a></li>
+<li><a href="#other.info">Appendix I: Other Information About HBase</a></li>
+<li><a href="#hbase.history">Appendix J: HBase History</a></li>
+<li><a href="#asf">Appendix K: HBase and the Apache Software Foundation</a></li>
+<li><a href="#orca">Appendix L: Apache HBase Orca</a></li>
+<li><a href="#tracing">Appendix M: Enabling Dapper-like Tracing in HBase</a></li>
+<li><a href="#tracing.client.modifications">154. Client Modifications</a></li>
+<li><a href="#tracing.client.shell">155. Tracing from HBase Shell</a></li>
+<li><a href="#hbase.rpc">Appendix N: 0.95 RPC Specification</a></li>
+</ul>
+</li>
+</ul>
+</div>
+</div>
+<div id="content">
+<div id="preamble">
+<div class="sectionbody">
+<div>
+  <a href="http://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_preface"><a class="anchor" href="#_preface"></a>Preface</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This is the official reference guide for the <a href="http://hbase.apache.org/">HBase</a> version it ships with.</p>
+</div>
+<div class="paragraph">
+<p>Herein you will find either the definitive documentation on an HBase topic as of its standing when the referenced HBase version shipped, or it will point to the location in <a href="http://hbase.apache.org/apidocs/index.html">Javadoc</a>, <a href="https://issues.apache.org/jira/browse/HBASE">JIRA</a> or <a href="http://wiki.apache.org/hadoop/Hbase">wiki</a> where the pertinent information can be found.</p>
+</div>
+<div class="paragraph">
+<div class="title">About This Guide</div>
+<p>This reference guide is a work in progress. The source for this guide can be found in the _src/main/asciidoc directory of the HBase source. This reference guide is marked up using <a href="http://asciidoc.org/">AsciiDoc</a> from which the finished guide is generated as part of the 'site' build target. Run</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bourne">mvn site</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>to generate this documentation.
+Amendments and improvements to the documentation are welcomed.
+Click <a href="https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12310753&amp;issuetype=1&amp;components=12312132&amp;summary=SHORT+DESCRIPTION">this link</a> to file a new documentation bug against Apache HBase with some values pre-selected.</p>
+</div>
+<div class="paragraph">
+<div class="title">Contributing to the Documentation</div>
+<p>For an overview of AsciiDoc and suggestions to get started contributing to the documentation, see the <a href="#appendix_contributing_to_documentation">relevant section later in this documentation</a>.</p>
+</div>
+<div class="paragraph">
+<div class="title">Heads-up if this is your first foray into the world of distributed computing&#8230;&#8203;</div>
+<p>If this is your first foray into the wonderful world of Distributed Computing, then you are in for some interesting times.
+First off, distributed systems are hard; making a distributed system hum requires a disparate skillset that spans systems (hardware and software) and networking.</p>
+</div>
+<div class="paragraph">
+<p>Your cluster&#8217;s operation can hiccup because of any of a myriad set of reasons from bugs in HBase itself through misconfigurations&#8201;&#8212;&#8201;misconfiguration of HBase but also operating system misconfigurations&#8201;&#8212;&#8201;through to hardware problems whether it be a bug in your network card drivers or an underprovisioned RAM bus (to mention two recent examples of hardware issues that manifested as "HBase is slow"). You will also need to do a recalibration if up to this your computing has been bound to a single box.
+Here is one good starting point: <a href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">Fallacies of Distributed Computing</a>.</p>
+</div>
+<div class="paragraph">
+<p>That said, you are welcome.<br>
+It&#8217;s a fun place to be.<br>
+Yours, the HBase Community.</p>
+</div>
+<div class="paragraph">
+<div class="title">Reporting Bugs</div>
+<p>Please use <a href="https://issues.apache.org/jira/browse/hbase">JIRA</a> to report non-security-related bugs.</p>
+</div>
+<div class="paragraph">
+<p>To protect existing HBase installations from new vulnerabilities, please <strong>do not</strong> use JIRA to report security-related bugs. Instead, send your report to the mailing list <a href="mailto:private@apache.org">private@apache.org</a>, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report.</p>
+</div>
+</div>
+</div>
+<h1 id="_getting_started" class="sect0"><a class="anchor" href="#_getting_started"></a>Getting Started</h1>
+<div class="sect1">
+<h2 id="_introduction"><a class="anchor" href="#_introduction"></a>1. Introduction</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p><a href="#quickstart">Quickstart</a> will get you up and running on a single-node, standalone instance of HBase, followed by a pseudo-distributed single-machine instance, and finally a fully-distributed cluster.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="quickstart"><a class="anchor" href="#quickstart"></a>2. Quick Start - Standalone HBase</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This guide describes the setup of a standalone HBase instance running against the local filesystem.
+This is not an appropriate configuration for a production instance of HBase, but will allow you to experiment with HBase.
+This section shows you how to create a table in HBase using the <code>hbase shell</code> CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase.
+Apart from downloading HBase, this procedure should take less than 10 minutes.</p>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+<div class="title">Local Filesystem and Durability</div>
+<em>The following is fixed in HBase 0.98.3 and beyond. See <a href="https://issues.apache.org/jira/browse/HBASE-11272">HBASE-11272</a> and <a href="https://issues.apache.org/jira/browse/HBASE-11218">HBASE-11218</a>.</em>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Using HBase with a local filesystem does not guarantee durability.
+The HDFS local filesystem implementation will lose edits if files are not properly closed.
+This is very likely to happen when you are experimenting with new software, starting and stopping the daemons often and not always cleanly.
+You need to run HBase on HDFS to ensure all writes are preserved.
+Running against the local filesystem is intended as a shortcut to get you familiar with how the general system works, as the very first phase of evaluation.
+See <a href="https://issues.apache.org/jira/browse/HBASE-3696">HBASE-3696</a> and its associated issues for more details about the issues of running on the local filesystem.</p>
+</div>
+<div id="loopback.ip" class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="title">Loopback IP - HBase 0.94.x and earlier</div>
+<em>The below advice is for hbase-0.94.x and older versions only. This is fixed in hbase-0.96.0 and beyond.</em>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you. See <a href="http://devving.com/?p=414">Why does HBase care about /etc/hosts?</a> for detail</p>
+</div>
+<div class="exampleblock">
+<div class="title">Example 1. Example /etc/hosts File for Ubuntu</div>
+<div class="content">
+<div class="paragraph">
+<p>The following <em>/etc/hosts</em> file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>127.0.0.1 localhost
+127.0.0.1 ubuntu.ubuntu-domain ubuntu</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_jdk_version_requirements"><a class="anchor" href="#_jdk_version_requirements"></a>2.1. JDK Version Requirements</h3>
+<div class="paragraph">
+<p>HBase requires that a JDK be installed.
+See <a href="#java">Java</a> for information about supported JDK versions.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_get_started_with_hbase"><a class="anchor" href="#_get_started_with_hbase"></a>2.2. Get Started with HBase</h3>
+<div class="olist arabic">
+<div class="title">Procedure: Download, Configure, and Start HBase</div>
+<ol class="arabic">
+<li>
+<p>Choose a download site from this list of <a href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download Mirrors</a>.
+Click on the suggested top link.
+This will take you to a mirror of <em>HBase
+Releases</em>.
+Click on the folder named <em>stable</em> and then download the binary file that ends in <em>.tar.gz</em> to your local filesystem.
+Prior to 1.x version, be sure to choose the version that corresponds with the version of Hadoop you are
+likely to use later (in most cases, you should choose the file for Hadoop 2, which will be called
+something like <em>hbase-0.98.13-hadoop2-bin.tar.gz</em>).
+Do not download the file ending in <em>src.tar.gz</em> for now.</p>
+</li>
+<li>
+<p>Extract the downloaded file, and change to the newly-created directory.</p>
+<div class="listingblock">
+<div class="content">
+<pre>$ tar xzvf hbase-&lt;?eval ${project.version}?&gt;-bin.tar.gz
+$ cd hbase-&lt;?eval ${project.version}?&gt;/</pre>
+</div>
+</div>
+</li>
+<li>
+<p>For HBase 0.98.5 and later, you are required to set the <code>JAVA_HOME</code> environment variable before starting HBase.
+Prior to 0.98.5, HBase attempted to detect the location of Java if the variables was not set.
+You can set the variable via your operating system&#8217;s usual mechanism, but HBase provides a central mechanism, <em>conf/hbase-env.sh</em>.
+Edit this file, uncomment the line starting with <code>JAVA_HOME</code>, and set it to the appropriate location for your operating system.
+The <code>JAVA_HOME</code> variable should be set to a directory which contains the executable file <em>bin/java</em>.
+Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java.
+In this case, you can set <code>JAVA_HOME</code> to the directory containing the symbolic link to <em>bin/java</em>, which is usually <em>/usr</em>.</p>
+<div class="listingblock">
+<div class="content">
+<pre>JAVA_HOME=/usr</pre>
+</div>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+These instructions assume that each node of your cluster uses the same configuration.
+If this is not the case, you may need to set <code>JAVA_HOME</code> separately for each node.
+</td>
+</tr>
+</table>
+</div>
+</li>
+<li>
+<p>Edit <em>conf/hbase-site.xml</em>, which is the main HBase configuration file.
+At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data.
+By default, a new directory is created under /tmp.
+Many servers are configured to delete the contents of <em>/tmp</em> upon reboot, so you should store the data elsewhere.
+The following configuration will store HBase&#8217;s data in the <em>hbase</em> directory, in the home directory of the user called <code>testuser</code>.
+Paste the <code>&lt;property&gt;</code> tags beneath the <code>&lt;configuration&gt;</code> tags, which should be empty in a new HBase install.</p>
+<div class="exampleblock">
+<div class="title">Example 2. Example <em>hbase-site.xml</em> for Standalone HBase</div>
+<div class="content">
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="xml"><span class="tag">&lt;configuration&gt;</span>
+  <span class="tag">&lt;property&gt;</span>
+    <span class="tag">&lt;name&gt;</span>hbase.rootdir<span class="tag">&lt;/name&gt;</span>
+    <span class="tag">&lt;value&gt;</span>file:///home/testuser/hbase<span class="tag">&lt;/value&gt;</span>
+  <span class="tag">&lt;/property&gt;</span>
+  <span class="tag">&lt;property&gt;</span>
+    <span class="tag">&lt;name&gt;</span>hbase.zookeeper.property.dataDir<span class="tag">&lt;/name&gt;</span>
+    <span class="tag">&lt;value&gt;</span>/home/testuser/zookeeper<span class="tag">&lt;/value&gt;</span>
+  <span class="tag">&lt;/property&gt;</span>
+<span class="tag">&lt;/configuration&gt;</span></code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="paragraph">
+<p>You do not need to create the HBase data directory.
+HBase will do this for you.
+If you create the directory, HBase will attempt to do a migration, which is not what you want.</p>
+</div>
+</li>
+<li>
+<p>The <em>bin/start-hbase.sh</em> script is provided as a convenient way to start HBase.
+Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully.
+You can use the <code>jps</code> command to verify that you have one running process called <code>HMaster</code>.
+In standalone mode HBase runs all daemons within this single JVM, i.e.
+the HMaster, a single HRegionServer, and the ZooKeeper daemon.</p>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Java needs to be installed and available.
+If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit the <em>conf/hbase-env.sh</em> file and modify the <code>JAVA_HOME</code> setting to point to the directory that contains <em>bin/java</em> your system.
+</td>
+</tr>
+</table>
+</div>
+</li>
+</ol>
+</div>
+<div id="shell_exercises" class="olist arabic">
+<div class="title">Procedure: Use HBase For the First Time</div>
+<ol class="arabic">
+<li>
+<p>Connect to HBase.</p>
+<div class="paragraph">
+<p>Connect to your running instance of HBase using the <code>hbase shell</code> command, located in the <em class="path">bin/</em> directory of your HBase install.
+In this example, some usage and version information that is printed when you start HBase Shell has been omitted.
+The HBase Shell prompt ends with a <code>&gt;</code> character.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/hbase shell
+hbase(main):001:0&gt;</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Display HBase Shell Help Text.</p>
+<div class="paragraph">
+<p>Type <code>help</code> and press Enter, to display some basic usage information for HBase Shell, as well as several example commands.
+Notice that table names, rows, columns all must be enclosed in quote characters.</p>
+</div>
+</li>
+<li>
+<p>Create a table.</p>
+<div class="paragraph">
+<p>Use the <code>create</code> command to create a new table.
+You must specify the table name and the ColumnFamily name.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):001:0&gt; create 'test', 'cf'
+0 row(s) in 0.4170 seconds
+
+=&gt; Hbase::Table - test</pre>
+</div>
+</div>
+</li>
+<li>
+<p>List Information About your Table</p>
+<div class="paragraph">
+<p>Use the <code>list</code> command to</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):002:0&gt; list 'test'
+TABLE
+test
+1 row(s) in 0.0180 seconds
+
+=&gt; ["test"]</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Put data into your table.</p>
+<div class="paragraph">
+<p>To put data into your table, use the <code>put</code> command.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):003:0&gt; put 'test', 'row1', 'cf:a', 'value1'
+0 row(s) in 0.0850 seconds
+
+hbase(main):004:0&gt; put 'test', 'row2', 'cf:b', 'value2'
+0 row(s) in 0.0110 seconds
+
+hbase(main):005:0&gt; put 'test', 'row3', 'cf:c', 'value3'
+0 row(s) in 0.0100 seconds</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here, we insert three values, one at a time.
+The first insert is at <code>row1</code>, column <code>cf:a</code>, with a value of <code>value1</code>.
+Columns in HBase are comprised of a column family prefix, <code>cf</code> in this example, followed by a colon and then a column qualifier suffix, <code>a</code> in this case.</p>
+</div>
+</li>
+<li>
+<p>Scan the table for all data at once.</p>
+<div class="paragraph">
+<p>One of the ways to get data from HBase is to scan.
+Use the <code>scan</code> command to scan the table for data.
+You can limit your scan, but for now, all data is fetched.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):006:0&gt; scan 'test'
+ROW                                      COLUMN+CELL
+ row1                                    column=cf:a, timestamp=1421762485768, value=value1
+ row2                                    column=cf:b, timestamp=1421762491785, value=value2
+ row3                                    column=cf:c, timestamp=1421762496210, value=value3
+3 row(s) in 0.0230 seconds</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Get a single row of data.</p>
+<div class="paragraph">
+<p>To get a single row of data at a time, use the <code>get</code> command.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):007:0&gt; get 'test', 'row1'
+COLUMN                                   CELL
+ cf:a                                    timestamp=1421762485768, value=value1
+1 row(s) in 0.0350 seconds</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Disable a table.</p>
+<div class="paragraph">
+<p>If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the <code>disable</code> command.
+You can re-enable it using the <code>enable</code> command.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):008:0&gt; disable 'test'
+0 row(s) in 1.1820 seconds
+
+hbase(main):009:0&gt; enable 'test'
+0 row(s) in 0.1770 seconds</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Disable the table again if you tested the <code>enable</code> command above:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):010:0&gt; disable 'test'
+0 row(s) in 1.1820 seconds</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Drop the table.</p>
+<div class="paragraph">
+<p>To drop (delete) a table, use the <code>drop</code> command.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hbase(main):011:0&gt; drop 'test'
+0 row(s) in 0.1370 seconds</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Exit the HBase Shell.</p>
+<div class="paragraph">
+<p>To exit the HBase Shell and disconnect from your cluster, use the <code>quit</code> command.
+HBase is still running in the background.</p>
+</div>
+</li>
+</ol>
+</div>
+<div class="olist arabic">
+<div class="title">Procedure: Stop HBase</div>
+<ol class="arabic">
+<li>
+<p>In the same way that the <em>bin/start-hbase.sh</em> script is provided to conveniently start all HBase daemons, the <em>bin/stop-hbase.sh</em>            script stops them.</p>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/stop-hbase.sh
+stopping hbase....................
+$</pre>
+</div>
+</div>
+</li>
+<li>
+<p>After issuing the command, it can take several minutes for the processes to shut down.
+Use the <code>jps</code> to be sure that the HMaster and HRegionServer processes are shut down.</p>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="quickstart_pseudo"><a class="anchor" href="#quickstart_pseudo"></a>2.3. Intermediate - Pseudo-Distributed Local Install</h3>
+<div class="paragraph">
+<p>After working your way through <a href="#quickstart">quickstart</a>, you can re-configure HBase to run in pseudo-distributed mode.
+Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and Zookeeper) runs as a separate process.
+By default, unless you configure the <code>hbase.rootdir</code> property as described in <a href="#quickstart">quickstart</a>, your data is still stored in <em>/tmp/</em>.
+In this walk-through, we store your data in HDFS instead, assuming you have HDFS available.
+You can skip the HDFS configuration to continue storing your data in the local filesystem.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="title">Hadoop Configuration</div>
+<div class="paragraph">
+<p>This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote
+system, and that they are running and available. It also assumes you are using Hadoop 2.
+The guide on
+<a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html">Setting up a Single Node Cluster</a>
+in the Hadoop documentation is a good starting point.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Stop HBase if it is running.</p>
+<div class="paragraph">
+<p>If you have just finished <a href="#quickstart">quickstart</a> and HBase is still running, stop it.
+This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.</p>
+</div>
+</li>
+<li>
+<p>Configure HBase.</p>
+<div class="paragraph">
+<p>Edit the <em>hbase-site.xml</em> configuration.
+First, add the following property.
+which directs HBase to run in distributed mode, with one JVM instance per daemon.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="xml"><span class="tag">&lt;property&gt;</span>
+  <span class="tag">&lt;name&gt;</span>hbase.cluster.distributed<span class="tag">&lt;/name&gt;</span>
+  <span class="tag">&lt;value&gt;</span>true<span class="tag">&lt;/value&gt;</span>
+<span class="tag">&lt;/property&gt;</span></code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Next, change the <code>hbase.rootdir</code> from the local filesystem to the address of your HDFS instance, using the <code>hdfs:////</code> URI syntax.
+In this example, HDFS is running on the localhost at port 8020.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="xml"><span class="tag">&lt;property&gt;</span>
+  <span class="tag">&lt;name&gt;</span>hbase.rootdir<span class="tag">&lt;/name&gt;</span>
+  <span class="tag">&lt;value&gt;</span>hdfs://localhost:8020/hbase<span class="tag">&lt;/value&gt;</span>
+<span class="tag">&lt;/property&gt;</span></code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>You do not need to create the directory in HDFS.
+HBase will do this for you.
+If you create the directory, HBase will attempt to do a migration, which is not what you want.</p>
+</div>
+</li>
+<li>
+<p>Start HBase.</p>
+<div class="paragraph">
+<p>Use the <em>bin/start-hbase.sh</em> command to start HBase.
+If your system is configured correctly, the <code>jps</code> command should show the HMaster and HRegionServer processes running.</p>
+</div>
+</li>
+<li>
+<p>Check the HBase directory in HDFS.</p>
+<div class="paragraph">
+<p>If everything worked correctly, HBase created its directory in HDFS.
+In the configuration above, it is stored in <em>/hbase/</em> on HDFS.
+You can use the <code>hadoop fs</code> command in Hadoop&#8217;s <em>bin/</em> directory to list this directory.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/hadoop fs -ls /hbase
+Found 7 items
+drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp
+drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs
+drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt
+drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data
+-rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id
+-rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version
+drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Create a table and populate it with data.</p>
+<div class="paragraph">
+<p>You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in <a href="#shell_exercises">shell exercises</a>.</p>
+</div>
+</li>
+<li>
+<p>Start and stop a backup HBase Master (HMaster) server.</p>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production.
+This step is offered for testing and learning purposes only.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>The HMaster server controls the HBase cluster.
+You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary.
+To start a backup HMaster, use the <code>local-master-backup.sh</code>.
+For each backup master you want to start, add a parameter representing the port offset for that master.
+Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032.
+The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ ./bin/local-master-backup.sh 2 3 5</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like <em>/tmp/hbase-USER-X-master.pid</em>.
+The only contents of the file is the PID.
+You can use the <code>kill -9</code> command to kill that PID.
+The following command will kill the master with port offset 1, but leave the cluster running:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Start and stop additional RegionServers</p>
+<div class="paragraph">
+<p>The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
+Generally, one HRegionServer runs per node in the cluster.
+Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
+The <code>local-regionservers.sh</code> command allows you to run multiple RegionServers.
+It works in a similar way to the <code>local-master-backup.sh</code> command, in that each parameter you provide represents the port offset for an instance.
+Each RegionServer requires two ports, and the default ports are 16020 and 16030.
+However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0.
+The base ports are 16200 and 16300 instead.
+You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server.
+The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ .bin/local-regionservers.sh start 2 3 4 5</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To stop a RegionServer manually, use the <code>local-regionservers.sh</code> command with the <code>stop</code> parameter and the offset of the server to stop.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ .bin/local-regionservers.sh stop 3</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Stop HBase.</p>
+<div class="paragraph">
+<p>You can stop HBase the same way as in the <a href="#quickstart">quickstart</a> procedure, using the <em>bin/stop-hbase.sh</em> command.</p>
+</div>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="quickstart_fully_distributed"><a class="anchor" href="#quickstart_fully_distributed"></a>2.4. Advanced - Fully Distributed</h3>
+<div class="paragraph">
+<p>In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios.
+In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon.
+These include primary and backup Master instances, multiple Zookeeper nodes, and multiple RegionServer nodes.</p>
+</div>
+<div class="paragraph">
+<p>This advanced quickstart adds two more nodes to your cluster.
+The architecture will be as follows:</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 1. Distributed Cluster Demo Architecture</caption>
+<colgroup>
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+<col style="width: 25%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Node Name</th>
+<th class="tableblock halign-left valign-top">Master</th>
+<th class="tableblock halign-left valign-top">ZooKeeper</th>
+<th class="tableblock halign-left valign-top">RegionServer</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">node-a.example.com</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">no</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">node-b.example.com</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">backup</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">node-c.example.com</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">no</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>This quickstart assumes that each node is a virtual machine and that they are all on the same network.
+It builds upon the previous quickstart, <a href="#quickstart_pseudo">Intermediate - Pseudo-Distributed Local Install</a>, assuming that the system you configured in that procedure is now <code>node-a</code>.
+Stop HBase on <code>node-a</code> before continuing.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other.
+If you see any errors like <code>no route to host</code>, check your firewall.
+</td>
+</tr>
+</table>
+</div>
+<div id="passwordless.ssh.quickstart" class="paragraph">
+<div class="title">Procedure: Configure Passwordless SSH Access</div>
+<p><code>node-a</code> needs to be able to log into <code>node-b</code> and <code>node-c</code> (and to itself) in order to start the daemons.
+The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from <code>node-a</code> to each of the others.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>On <code>node-a</code>, generate a key pair.</p>
+<div class="paragraph">
+<p>While logged in as the user who will run HBase, generate a SSH key pair, using the following command:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bash">$ ssh-keygen -t rsa</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If the command succeeds, the location of the key pair is printed to standard output.
+The default name of the public key is <em>id_rsa.pub</em>.</p>
+</div>
+</li>
+<li>
+<p>Create the directory that will hold the shared keys on the other nodes.</p>
+<div class="paragraph">
+<p>On <code>node-b</code> and <code>node-c</code>, log in as the HBase user and create a <em>.ssh/</em> directory in the user&#8217;s home directory, if it does not already exist.
+If it already exists, be aware that it may already contain other keys.</p>
+</div>
+</li>
+<li>
+<p>Copy the public key to the other nodes.</p>
+<div class="paragraph">
+<p>Securely copy the public key from <code>node-a</code> to each of the nodes, by using the <code>scp</code> or some other secure means.
+On each of the other nodes, create a new file called <em>.ssh/authorized_keys</em> <em>if it does
+not already exist</em>, and append the contents of the <em>id_rsa.pub</em> file to the end of it.
+Note that you also need to do this for <code>node-a</code> itself.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ cat id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys</pre>
+</div>
+</div>
+</li>
+<li>
+<p>Test password-less login.</p>
+<div class="paragraph">
+<p>If you performed the procedure correctly, if you SSH from <code>node-a</code> to either of the other nodes, using the same username, you should not be prompted for a password.</p>
+</div>
+</li>
+<li>
+<p>Since <code>node-b</code> will run a backup Master, repeat the procedure above, substituting <code>node-b</code> everywhere you see <code>node-a</code>.
+Be sure not to overwrite your existing <em>.ssh/authorized_keys</em> files, but concatenate the new key onto the existing file using the <code>&gt;&gt;</code> operator rather than the <code>&gt;</code> operator.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<div class="title">Procedure: Prepare <code>node-a</code></div>
+<p><code>node-a</code> will run your primary master and ZooKeeper processes, but no RegionServers.
+. Stop the RegionServer from starting on <code>node-a</code>.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Edit <em>conf/regionservers</em> and remove the line which contains <code>localhost</code>. Add lines with the hostnames or IP addresses for <code>node-b</code> and <code>node-c</code>.</p>
+<div class="paragraph">
+<p>Even if you did want to run a RegionServer on <code>node-a</code>, you should refer to it by the hostname the other servers would use to communicate with it.
+In this case, that would be <code>node-a.example.com</code>.
+This enables you to distribute the configuration to each node of your cluster any hostname conflicts.
+Save the file.</p>
+</div>
+</li>
+<li>
+<p>Configure HBase to use <code>node-b</code> as a backup master.</p>
+<div class="paragraph">
+<p>Create a new file in <em>conf/</em> called <em>backup-masters</em>, and add a new line to it with the hostname for <code>node-b</code>.
+In this demonstration, the hostname is <code>node-b.example.com</code>.</p>
+</div>
+</li>
+<li>
+<p>Configure ZooKeeper</p>
+<div class="paragraph">
+<p>In reality, you should carefully consider your ZooKeeper configuration.
+You can find out more about configuring ZooKeeper in <a href="#zookeeper">zookeeper</a>.
+This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.</p>
+</div>
+<div class="paragraph">
+<p>On <code>node-a</code>, edit <em>conf/hbase-site.xml</em> and add the following properties.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="xml"><span class="tag">&lt;property&gt;</span>
+  <span class="tag">&lt;name&gt;</span>hbase.zookeeper.quorum<span class="tag">&lt;/name&gt;</span>
+  <span class="tag">&lt;value&gt;</span>node-a.example.com,node-b.example.com,node-c.example.com<span class="tag">&lt;/value&gt;</span>
+<span class="tag">&lt;/property&gt;</span>
+<span class="tag">&lt;property&gt;</span>
+  <span class="tag">&lt;name&gt;</span>hbase.zookeeper.property.dataDir<span class="tag">&lt;/name&gt;</span>
+  <span class="tag">&lt;value&gt;</span>/usr/local/zookeeper<span class="tag">&lt;/value&gt;</span>
+<span class="tag">&lt;/property&gt;</span></code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Everywhere in your configuration that you have referred to <code>node-a</code> as <code>localhost</code>, change the reference to point to the hostname that the other nodes will use to refer to <code>node-a</code>.
+In these examples, the hostname is <code>node-a.example.com</code>.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<div class="title">Procedure: Prepare <code>node-b</code> and <code>node-c</code></div>
+<p><code>node-b</code> will run a backup master server and a ZooKeeper instance.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Download and unpack HBase.</p>
+<div class="paragraph">
+<p>Download and unpack HBase to <code>node-b</code>, just as you did for the standalone and pseudo-distributed quickstarts.</p>
+</div>
+</li>
+<li>
+<p>Copy the configuration files from <code>node-a</code> to <code>node-b</code>.and <code>node-c</code>.</p>
+<div class="paragraph">
+<p>Each node of your cluster needs to have the same configuration information.
+Copy the contents of the <em>conf/</em> directory to the <em>conf/</em> directory on <code>node-b</code> and <code>node-c</code>.</p>
+</div>
+</li>
+</ol>
+</div>
+<div class="olist arabic">
+<div class="title">Procedure: Start and Test Your Cluster</div>
+<ol class="arabic">
+<li>
+<p>Be sure HBase is not running on any node.</p>
+<div class="paragraph">
+<p>If you forgot to stop HBase from previous testing, you will have errors.
+Check to see whether HBase is running on any of your nodes by using the <code>jps</code> command.
+Look for the processes <code>HMaster</code>, <code>HRegionServer</code>, and <code>HQuorumPeer</code>.
+If they exist, kill them.</p>
+</div>
+</li>
+<li>
+<p>Start the cluster.</p>
+<div class="paragraph">
+<p>On <code>node-a</code>, issue the <code>start-hbase.sh</code> command.
+Your output will be similar to that below.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$ bin/start-hbase.sh
+node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
+node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
+node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
+starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
+node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
+node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
+node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.</p>
+</div>
+</li>
+<li>
+<p>Verify that the processes are running.</p>
+<div class="paragraph">
+<p>On each node of the cluster, run the <code>jps</code> command and verify that the correct processes are running on each server.
+You may see additional Java processes running on your servers as well, if they are used for other purposes.</p>
+</div>
+<div class="exampleblock">
+<div class="title">Example 3. <code>node-a</code> <code>jps</code> Output</div>
+<div class="content">
+<div class="listingblock">
+<div class="content">
+<pre>$ jps
+20355 Jps
+20071 HQuorumPeer
+20137 HMaster</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="exampleblock">
+<div class="title">Example 4. <code>node-b</code> <code>jps</code> Output</div>
+<div class="content">
+<div class="listingblock">
+<div class="content">
+<pre>$ jps
+15930 HRegionServer
+16194 Jps
+15838 HQuorumPeer
+16010 HMaster</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="exampleblock">
+<div class="title">Example 5. <code>node-a</code> <code>jps</code> Output</div>
+<div class="content">
+<div class="listingblock">
+<div class="content">
+<pre>$ jps
+13901 Jps
+13639 HQuorumPeer
+13737 HRegionServer</pre>
+</div>
+</div>
+</div>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="title">ZooKeeper Process Name</div>
+<div class="paragraph">
+<p>The <code>HQuorumPeer</code> process is a ZooKeeper instance which is controlled and started by HBase.
+If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only.
+If ZooKeeper is run outside of HBase, the process is called <code>QuorumPeer</code>.
+For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see <a href="#zookeeper">zookeeper</a>.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+</li>
+<li>
+<p>Browse to the Web UI.</p>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="title">Web UI Port Changes</div>
+Web UI Port Changes
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the
+Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the RegionServer.</p>
+</div>
+<div class="paragraph">
+<p>If everything is set up correctly, you should be able to connect to the UI for the Master
+<code><a href="http://node-a.example.com:16010/" class="bare">http://node-a.example.com:16010/</a></code> or the secondary master at <code><a href="http://node-b.example.com:16010/" class="bare">http://node-b.example.com:16010/</a></code>
+for the secondary master, using a web browser.
+If you can connect via <code>localhost</code> but not from another host, check your firewall rules.
+You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by
+clicking their links in the web UI for the Master.</p>
+</div>
+</li>
+<li>
+<p>Test what happens when nodes or services disappear.</p>
+<div class="paragraph">
+<p>With a three-node cluster like you have configured, things will not be very resilient.
+Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.</p>
+</div>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_where_to_go_next"><a class="anchor" href="#_where_to_go_next"></a>2.5. Where to go next</h3>
+<div class="paragraph">
+<p>The next chapter, <a href="#configuration">configuration</a>, gives more information about the different HBase run modes, system requirements for running HBase, and critical configuration areas for setting up a distributed HBase cluster.</p>
+</div>
+</div>
+</div>
+</div>
+<h1 id="configuration" class="sect0"><a class="anchor" href="#configuration"></a>Apache HBase Configuration</h1>
+<div class="openblock partintro">
+<div class="content">
+This chapter expands upon the <a href="#getting_started">[getting_started]</a> chapter to further explain configuration of Apache HBase.
+Please read this chapter carefully, especially the <a href="#basic.prerequisites">Basic Prerequisites</a> to ensure that your HBase testing and deployment goes smoothly, and prevent data loss.
+</div>
+</div>
+<div class="sect1">
+<h2 id="_configuration_files"><a class="anchor" href="#_configuration_files"></a>3. Configuration Files</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Apache HBase uses the same configuration system as Apache Hadoop.
+All configuration files are located in the <em>conf/</em> directory, which needs to be kept in sync for each node on your cluster.</p>
+</div>
+<div class="dlist">
+<div class="title">HBase Configuration File Descriptions</div>
+<dl>
+<dt class="hdlist1"><em>backup-masters</em></dt>
+<dd>
+<p>Not present by default.
+A plain-text file which lists hosts on which the Master should start a backup Master process, one host per line.</p>
+</dd>
+<dt class="hdlist1"><em>hadoop-metrics2-hbase.properties</em></dt>
+<dd>
+<p>Used to connect HBase Hadoop&#8217;s Metrics2 framework.
+See the <a href="http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2">Hadoop Wiki entry</a> for more information on Metrics2.
+Contains only commented-out examples by default.</p>
+</dd>
+<dt class="hdlist1"><em>hbase-env.cmd</em> and <em>hbase-env.sh</em></dt>
+<dd>
+<p>Script for Windows and Linux / Unix environments to set up the working environment for HBase, including the location of Java, Java options, and other environment variables.
+The file contains many commented-out examples to provide guidance.</p>
+</dd>
+<dt class="hdlist1"><em>hbase-policy.xml</em></dt>
+<dd>
+<p>The default policy configuration file used by RPC servers to make authorization decisions on client requests.
+Only used if HBase <a href="#security">security</a> is enabled.</p>
+</dd>
+<dt class="hdlist1"><em>hbase-site.xml</em></dt>
+<dd>
+<p>The main HBase configuration file.
+This file specifies configuration options which override HBase&#8217;s default configuration.
+You can view (but do not edit) the default configuration file at <em>docs/hbase-default.xml</em>.
+You can also view the entire effective configuration for your cluster (defaults and overrides) in the <span class="label">HBase Configuration</span> tab of the HBase Web UI.</p>
+</dd>
+<dt class="hdlist1"><em>log4j.properties</em></dt>
+<dd>
+<p>Configuration file for HBase logging via <code>log4j</code>.</p>
+</dd>
+<dt class="hdlist1"><em>regionservers</em></dt>
+<dd>
+<p>A plain-text file containing a list of hosts which should run a RegionServer in your HBase cluster.
+By default this file contains the single entry <code>localhost</code>.
+It should contain a list of hostnames or IP addresses, one per line, and should only contain <code>localhost</code> if each node in your cluster will run a RegionServer on its <code>localhost</code> interface.</p>
+</dd>
+</dl>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="title">Checking XML Validity</div>
+<div class="paragraph">
+<p>When you edit XML, it is a good idea to use an XML-aware editor to be sure that your syntax is correct and your XML is well-formed.
+You can also use the <code>xmllint</code> utility to check that your XML is well-formed.
+By default, <code>xmllint</code> re-flows and prints the XML to standard output.
+To check for well-formedness and only print output if errors exist, use the command <code>xmllint -noout filename.xml</code>.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+<div class="title">Keep Configuration In Sync Across the Cluster</div>
+<div class="paragraph">
+<p>When running in distributed mode, after you make an edit to an HBase configuration, make sure you copy the content of the <em>conf/</em> directory to all nodes of the cluster.
+HBase will not do this for you.
+Use <code>rsync</code>, <code>scp</code>, or another secure mechanism for copying the configuration files to your nodes.
+For most configuration, a restart is needed for servers to pick up changes An exception is dynamic configuration.
+to be described later below.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="basic.prerequisites"><a class="anchor" href="#basic.prerequisites"></a>4. Basic Prerequisites</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This section lists required services and some required system configuration.</p>
+</div>
+<table id="java" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 2. Java</caption>
+<colgroup>
+<col style="width: 14%;">
+<col style="width: 14%;">
+<col style="width: 14%;">
+<col style="width: 57%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">HBase Version</th>
+<th class="tableblock halign-left valign-top">JDK 6</th>
+<th class="tableblock halign-left valign-top">JDK 7</th>
+<th class="tableblock halign-left valign-top">JDK 8</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.2</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Running with JDK 8 will work but is not well tested.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Running with JDK 8 will work but is not well tested.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.98</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Running with JDK 8 works but is not well tested. Building with JDK 8 would require removal of the
+deprecated <code>remove()</code> method of the <code>PoolMap</code> class and is under consideration. See
+<a href="https://issues.apache.org/jira/browse/HBASE-7608">HBASE-7608</a> for more information about JDK 8
+support.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.94</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">yes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">N/A</p></td>
+</tr>
+</tbody>
+</table>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+In HBase 0.98.5 and newer, you must set <code>JAVA_HOME</code> on each node of your cluster. <em>hbase-env.sh</em> provides a handy mechanism to do this.
+</td>
+</tr>
+</table>
+</div>
+<div class="dlist">
+<div class="title">Operating System Utilities</div>
+<dl>
+<dt class="hdlist1">ssh</dt>
+<dd>
+<p>HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running <code>ssh</code> so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<a href="#passwordless.ssh.quickstart">Procedure: Configure Passwordless SSH Access</a>". If your cluster nodes use OS X, see the section, <a href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29">SSH: Setting up Remote Desktop and Enabling Self-Login</a> on the Hadoop wiki.</p>
+</dd>
+<dt class="hdlist1">DNS</dt>
+<dd>
+<p>HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The <a href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</a> tool can be used to verify DNS is working correctly on the cluster. The project <code>README</code> file provides detailed instructions on usage.</p>
+</dd>
+<dt class="hdlist1">Loopback IP</dt>
+<dd>
+<p>Prior to hbase-0.96.0, HBase only used the IP address <code>127.0.0.1</code> to refer to <code>localhost</code>, and this could not be configured.
+See <a href="#loopback.ip">Loopback IP</a> for more details.</p>
+</dd>
+<dt class="hdlist1">NTP</dt>
+<dd>
+<p>The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism, on your cluster, and that all nodes look to the same service for time synchronization. See the <a href="http://www.tldp.org/LDP/sag/html/basic-ntp-config.html">Basic NTP Configuration</a> at <em class="citetitle">The Linux Documentation Project (TLDP)</em> to set up NTP.</p>
+</dd>
+<dt class="hdlist1">Limits on Number of Files and Processes (ulimit)</dt>
+<dd>
+<p>Apache HBase is a database. It requires the ability to open a large number of files at once. Many Linux distributions limit the number of files a single user is allowed to open to <code>1024</code> (or <code>256</code> on older versions of OS X). You can check this limit on your servers by running the command <code>ulimit -n</code> when logged in as the user which runs HBase. See <a href="#trouble.rs.runtime.filehandles">the Troubleshooting section</a> for some of the problems you may experience if the limit is too low. You may also notice errors such as the following:</p>
+<div class="listingblock">
+<div class="content">
+<pre>2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
+2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, because the value is usually expressed in multiples of 1024. Each ColumnFamily has at least one StoreFile, and possibly more than six StoreFiles if the region is under load. The number of open files required depends upon the number of ColumnFamilies and the number of regions. The following is a rough formula for calculating the potential number of open files on a RegionServer.</p>
+</div>
+<div class="listingblock">
+<div class="title">Calculate the Potential Number of Open Files</div>
+<div class="content">
+<pre>(StoreFiles per ColumnFamily) x (regions per RegionServer)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>For example, assuming that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open <code>3 * 3 * 100 = 900</code> file descriptors, not counting open JAR files, configuration files, and others. Opening a file does not take many resources, and the risk of allowing a user to open too many files is minimal.</p>
+</div>
+<div class="paragraph">
+<p>Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the <code>ulimit -u</code> command. This should not be confused with the <code>nproc</code> command, which controls the number of CPUs available to a given user. Under load, a <code>ulimit -u</code> that is too low can cause OutOfMemoryError exceptions. See Jack Levin&#8217;s major HDFS issues thread on the hbase-users mailing list, from 2011.</p>
+</div>
+<div class="paragraph">
+<p>Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user&#8217;s ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration Parameters: What can you just ignore?</p>
+</div>
+<div class="exampleblock">
+<div class="title">Example 6. <code>ulimit</code> Settings on Ubuntu</div>
+<div class="content">
+<div class="paragraph">
+<p>To configure ulimit settings on Ubuntu, edit <em>/etc/security/limits.conf</em>, which is a space-delimited file with four columns. Refer to the man page for <em>limits.conf</em> for details about the format of this file. In the following example, the first line sets both soft and hard limits for the number of open files (nofile) to 32768 for the operating system user with the username hadoop. The second line sets the number of processes to 32000 for the same user.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>hadoop  -       nofile  32768
+hadoop  -       nproc   32000</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The settings are only applied if the Pluggable Authentication Module (PAM) environment is directed to use them. To configure PAM to use these limits, be sure that the <em>/etc/pam.d/common-session</em> file contains the following line:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>session required  pam_limits.so</pre>
+</div>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">Linux Shell</dt>
+<dd>
+<p>All of the shell scripts that come with HBase rely on the <a href="http://www.gnu.org/software/bash">GNU Bash</a> shell.</p>
+</dd>
+<dt class="hdlist1">Windows</dt>
+<dd>
+<p>Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited.
+Running a on Windows nodes is not recommended for production systems.</p>
+</dd>
+</dl>
+</div>
+<div class="sect2">
+<h3 id="hadoop"><a class="anchor" href="#hadoop"></a>4.1. <a href="http://hadoop.apache.org">Hadoop</a></h3>
+<div class="paragraph">
+<p>The following table summarizes the versions of Hadoop supported with each version of HBase.
+Based on the version of HBase, you should select the most appropriate version of Hadoop.
+You can use Apache Hadoop, or a vendor&#8217;s distribution of Hadoop.
+No distinction is made here.
+See <a href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support">the Hadoop wiki</a> for information about vendors of Hadoop.</p>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="title">Hadoop 2.x is recommended.</div>
+<div class="paragraph">
+<p>Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile.
+Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience.
+HBase 0.98 drops support for Hadoop 1.0, deprecates use of Hadoop 1.1+, and HBase 1.0 will not support Hadoop 1.x.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Use the following legend to interpret this table:</p>
+</div>
+<div class="ulist">
+<div class="title">Hadoop version support matrix</div>
+<ul>
+<li>
+<p>"S" = supported</p>
+</li>
+<li>
+<p>"X" = not supported</p>
+</li>
+<li>
+<p>"NT" = Not tested</p>
+</li>
+</ul>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 16%;">
+<col style="width: 16%;">
+<col style="width: 16%;">
+<col style="width: 16%;">
+<col style="width: 16%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"></th>
+<th class="tableblock halign-left valign-top">HBase-0.94.x</th>
+<th class="tableblock halign-left valign-top">HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.)</th>
+<th class="tableblock halign-left valign-top">HBase-1.0.x (Hadoop 1.x is NOT supported)</th>
+<th class="tableblock halign-left valign-top">HBase-1.1.x</th>
+<th class="tableblock halign-left valign-top">HBase-1.2.x</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-1.0.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-1.1.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-0.23.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.0.x-alpha</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.1.0-beta</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.2.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.3.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.4.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.5.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.6.x</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.7.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">X</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hadoop-2.7.1+</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">NT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">S</p></td>
+</tr>
+</tbody>
+</table>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="title">Hadoop 2.6.x</div>
+<div class="paragraph">
+<p>Hadoop distributions based on the 2.6.x line <strong>must</strong> have
+<a href="https://issues.apache.org/jira/browse/HADOOP-11710">HADOOP-11710</a> applied if you plan to run
+HBase on top of an HDFS Encryption Zone. Failure to do so will result in cluster failure and
+data loss.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock tip">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-tip" title="Tip"></i>
+</td>
+<td class="content">
+<div class="title">Hadoop 2.7.x</div>
+<div class="paragraph">
+<p>Hadoop version 2.7.0 is not tested or supported as the Hadoop PMC has explicitly labeled that release as not being stable.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="title">Replace the Hadoop Bundled With HBase!</div>
+<div class="paragraph">
+<p>Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its <em>lib</em> directory.
+The bundled jar is ONLY for use in standalone mode.
+In distributed mode, it is <em>critical</em> that the version of Hadoop that is out on your cluster match what is under HBase.
+Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues.
+Make sure you replace the jar in HBase everywhere on your cluster.
+Hadoop version mismatch issues have various manifestations but often all looks like its hung up.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="sect3">
+<h4 id="hadoop2.hbase_0.94"><a class="anchor" href="#hadoop2.hbase_0.94"></a>4.1.1. Apache HBase 0.94 with Hadoop 2</h4>
+<div class="paragraph">
+<p>To get 0.94.x to run on Hadoop 2.2.0, you need to change the hadoop 2 and protobuf versions in the <em>pom.xml</em>: Here is a diff with pom.xml changes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="java"><span class="error">$</span> svn diff pom.xml
+Index: pom.xml
+===================================================================
+--- pom.xml     (revision <span class="integer">1545157</span>)
++++ pom.xml     (working copy)
+<span class="error">@</span><span class="error">@</span> -<span class="integer">1034</span>,<span class="integer">7</span> +<span class="integer">1034</span>,<span class="integer">7</span> <span class="error">@</span><span class="error">@</span>
+     &lt;slf4j.version&gt;<span class="float">1.4</span><span class="float">.3</span>&lt;/slf4j.version&gt;
+     &lt;log4j.version&gt;<span class="float">1.2</span><span class="float">.16</span>&lt;/log4j.version&gt;
+     &lt;mockito-all.version&gt;<span class="float">1.8</span><span class="float">.5</span>&lt;/mockito-all.version&gt;
+-    &lt;protobuf.version&gt;<span class="float">2.4</span><span class="float">.0</span>a&lt;/protobuf.version&gt;
++    &lt;protobuf.version&gt;<span class="float">2.5</span><span class="float">.0</span>&lt;/protobuf.version&gt;
+     &lt;stax-api.version&gt;<span class="float">1.0</span><span class="float">.1</span>&lt;/stax-api.version&gt;
+     &lt;thrift.version&gt;<span class="float">0.8</span><span class="float">.0</span>&lt;/thrift.version&gt;
+     &lt;zookeeper.version&gt;<span class="float">3.4</span><span class="float">.5</span>&lt;/zookeeper.version&gt;
+<span class="error">@</span><span class="error">@</span> -<span class="integer">2241</span>,<span class="integer">7</span> +<span class="integer">2241</span>,<span class="integer">7</span> <span class="error">@</span><span class="error">@</span>
+         &lt;/property&gt;
+       &lt;/activation&gt;
+       &lt;properties&gt;
+-        &lt;hadoop.version&gt;<span class="float">2.0</span><span class="float">.0</span>-alpha&lt;/hadoop.version&gt;
++        &lt;hadoop.version&gt;<span class="float">2.2</span><span class="float">.0</span>&lt;/hadoop.version&gt;
+         &lt;slf4j.version&gt;<span class="float">1.6</span><span class="float">.1</span>&lt;/slf4j.version&gt;
+       &lt;/properties&gt;
+       &lt;dependencies&gt;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The next step is to regenerate Protobuf files and assuming that the Protobuf has been installed:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Go to the HBase root folder, using the command line;</p>
+</li>
+<li>
+<p>Type the following commands:</p>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bourne">$ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/hbase.proto</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="bourne">$ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/ErrorHandling.proto</code></pre>
+</div>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Building against the hadoop 2 profile by running something like the following command:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>$  mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests</pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="hadoop.hbase_0.94"><a class="anchor" href="#hadoop.hbase_0.94"></a>4.1.2. Apache HBase 0.92 and 0.94</h4>
+<div class="paragraph">
+<p>HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 1.0.x, and 1.1.x.
+HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have to recompile the code using the specific maven profile (see top level pom.xml)</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="hadoop.hbase_0.96"><a class="anchor" href="#hadoop.hbase_0.96"></a>4.1.3. Apache HBase 0.96</h4>
+<div class="paragraph">
+<p>As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required.
+Hadoop 2 is strongly encouraged (faster but also has fixes that help MTTR). We will no longer run properly on older Hadoops such as 0.20.205 or branch-0.20-append.
+Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop. See <a href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS:
+                Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</a></p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="hadoop.older.versions"><a class="anchor" href="#hadoop.older.versions"></a>4.1.4. Hadoop versions 0.20.x - 1.x</h4>
+<div class="paragraph">
+<p>HBase will lose data unless it is running on an HDFS that has a durable <code>sync</code> implementation.
+DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO NOT have this attribute.
+Currently only Hadoop versions 0.20.205.x or any release in excess of this version&#8201;&#8212;&#8201;this includes hadoop-1.0.0&#8201;&#8212;&#8201;have a working, durable sync.
+The Cloudera blog post <a href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An
+            update on Apache Hadoop 1.0</a> by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
+It&#8217;s worth checking out if you are having trouble making sense of the Hadoop version morass.</p>
+</div>
+<div class="paragraph">
+<p>Sync has to be explicitly enabled by setting <code>dfs.support.append</code> equal to true on both the client side&#8201;&#8212;&#8201;in <em>hbase-site.xml</em>&#8201;&#8212;&#8201;and on the serverside in <em>hdfs-site.xml</em> (The sync facility HBase needs is a subset of the append code path).</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="xml"><span class="tag">&lt;property&gt;</span>
+  <span class="tag">&lt;name&gt;</span>dfs.support.append<span class="tag">&lt;/name&gt;</span>

[... 29670 lines stripped ...]