You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by bu...@apache.org on 2015/07/14 04:49:38 UTC
[13/15] hbase git commit: HBASE-14066 clean out old docbook docs from
branch-1.
http://git-wip-us.apache.org/repos/asf/hbase/blob/fdd2692f/src/main/docbkx/case_studies.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/case_studies.xml b/src/main/docbkx/case_studies.xml
deleted file mode 100644
index 332caf8..0000000
--- a/src/main/docbkx/case_studies.xml
+++ /dev/null
@@ -1,239 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter
- version="5.0"
- xml:id="casestudies"
- xmlns="http://docbook.org/ns/docbook"
- xmlns:xlink="http://www.w3.org/1999/xlink"
- xmlns:xi="http://www.w3.org/2001/XInclude"
- xmlns:svg="http://www.w3.org/2000/svg"
- xmlns:m="http://www.w3.org/1998/Math/MathML"
- xmlns:html="http://www.w3.org/1999/xhtml"
- xmlns:db="http://docbook.org/ns/docbook">
- <!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
- <title>Apache HBase Case Studies</title>
- <section
- xml:id="casestudies.overview">
- <title>Overview</title>
- <para> This chapter will describe a variety of performance and troubleshooting case studies that
- can provide a useful blueprint on diagnosing Apache HBase cluster issues. </para>
- <para> For more information on Performance and Troubleshooting, see <xref
- linkend="performance" /> and <xref
- linkend="trouble" />. </para>
- </section>
-
- <section
- xml:id="casestudies.schema">
- <title>Schema Design</title>
- <para>See the schema design case studies here: <xref
- linkend="schema.casestudies" />
- </para>
-
- </section>
- <!-- schema design -->
-
- <section
- xml:id="casestudies.perftroub">
- <title>Performance/Troubleshooting</title>
-
- <section
- xml:id="casestudies.slownode">
- <title>Case Study #1 (Performance Issue On A Single Node)</title>
- <section>
- <title>Scenario</title>
- <para> Following a scheduled reboot, one data node began exhibiting unusual behavior.
- Routine MapReduce jobs run against HBase tables which regularly completed in five or six
- minutes began taking 30 or 40 minutes to finish. These jobs were consistently found to be
- waiting on map and reduce tasks assigned to the troubled data node (e.g., the slow map
- tasks all had the same Input Split). The situation came to a head during a distributed
- copy, when the copy was severely prolonged by the lagging node. </para>
- </section>
- <section>
- <title>Hardware</title>
- <itemizedlist>
- <title>Datanodes:</title>
- <listitem>
- <para>Two 12-core processors</para>
- </listitem>
- <listitem>
- <para>Six Enerprise SATA disks</para>
- </listitem>
- <listitem>
- <para>24GB of RAM</para>
- </listitem>
- <listitem>
- <para>Two bonded gigabit NICs</para>
- </listitem>
- </itemizedlist>
- <itemizedlist>
- <title>Network:</title>
- <listitem>
- <para>10 Gigabit top-of-rack switches</para>
- </listitem>
- <listitem>
- <para>20 Gigabit bonded interconnects between racks.</para>
- </listitem>
- </itemizedlist>
- </section>
- <section>
- <title>Hypotheses</title>
- <section>
- <title>HBase "Hot Spot" Region</title>
- <para> We hypothesized that we were experiencing a familiar point of pain: a "hot spot"
- region in an HBase table, where uneven key-space distribution can funnel a huge number
- of requests to a single HBase region, bombarding the RegionServer process and cause slow
- response time. Examination of the HBase Master status page showed that the number of
- HBase requests to the troubled node was almost zero. Further, examination of the HBase
- logs showed that there were no region splits, compactions, or other region transitions
- in progress. This effectively ruled out a "hot spot" as the root cause of the observed
- slowness. </para>
- </section>
- <section>
- <title>HBase Region With Non-Local Data</title>
- <para> Our next hypothesis was that one of the MapReduce tasks was requesting data from
- HBase that was not local to the datanode, thus forcing HDFS to request data blocks from
- other servers over the network. Examination of the datanode logs showed that there were
- very few blocks being requested over the network, indicating that the HBase region was
- correctly assigned, and that the majority of the necessary data was located on the node.
- This ruled out the possibility of non-local data causing a slowdown. </para>
- </section>
- <section>
- <title>Excessive I/O Wait Due To Swapping Or An Over-Worked Or Failing Hard Disk</title>
- <para> After concluding that the Hadoop and HBase were not likely to be the culprits, we
- moved on to troubleshooting the datanode's hardware. Java, by design, will periodically
- scan its entire memory space to do garbage collection. If system memory is heavily
- overcommitted, the Linux kernel may enter a vicious cycle, using up all of its resources
- swapping Java heap back and forth from disk to RAM as Java tries to run garbage
- collection. Further, a failing hard disk will often retry reads and/or writes many times
- before giving up and returning an error. This can manifest as high iowait, as running
- processes wait for reads and writes to complete. Finally, a disk nearing the upper edge
- of its performance envelope will begin to cause iowait as it informs the kernel that it
- cannot accept any more data, and the kernel queues incoming data into the dirty write
- pool in memory. However, using <code>vmstat(1)</code> and <code>free(1)</code>, we could
- see that no swap was being used, and the amount of disk IO was only a few kilobytes per
- second. </para>
- </section>
- <section>
- <title>Slowness Due To High Processor Usage</title>
- <para> Next, we checked to see whether the system was performing slowly simply due to very
- high computational load. <code>top(1)</code> showed that the system load was higher than
- normal, but <code>vmstat(1)</code> and <code>mpstat(1)</code> showed that the amount of
- processor being used for actual computation was low. </para>
- </section>
- <section>
- <title>Network Saturation (The Winner)</title>
- <para> Since neither the disks nor the processors were being utilized heavily, we moved on
- to the performance of the network interfaces. The datanode had two gigabit ethernet
- adapters, bonded to form an active-standby interface. <code>ifconfig(8)</code> showed
- some unusual anomalies, namely interface errors, overruns, framing errors. While not
- unheard of, these kinds of errors are exceedingly rare on modern hardware which is
- operating as it should: </para>
- <screen language="bourne">
-$ /sbin/ifconfig bond0
-bond0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
-inet addr:10.x.x.x Bcast:10.x.x.255 Mask:255.255.255.0
-UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
-RX packets:2990700159 errors:12 dropped:0 overruns:1 frame:6 <--- Look Here! Errors!
-TX packets:3443518196 errors:0 dropped:0 overruns:0 carrier:0
-collisions:0 txqueuelen:0
-RX bytes:2416328868676 (2.4 TB) TX bytes:3464991094001 (3.4 TB)
- </screen>
- <para> These errors immediately lead us to suspect that one or more of the ethernet
- interfaces might have negotiated the wrong line speed. This was confirmed both by
- running an ICMP ping from an external host and observing round-trip-time in excess of
- 700ms, and by running <code>ethtool(8)</code> on the members of the bond interface and
- discovering that the active interface was operating at 100Mbs/, full duplex. </para>
- <screen language="bourne">
-$ sudo ethtool eth0
-Settings for eth0:
-Supported ports: [ TP ]
-Supported link modes: 10baseT/Half 10baseT/Full
- 100baseT/Half 100baseT/Full
- 1000baseT/Full
-Supports auto-negotiation: Yes
-Advertised link modes: 10baseT/Half 10baseT/Full
- 100baseT/Half 100baseT/Full
- 1000baseT/Full
-Advertised pause frame use: No
-Advertised auto-negotiation: Yes
-Link partner advertised link modes: Not reported
-Link partner advertised pause frame use: No
-Link partner advertised auto-negotiation: No
-Speed: 100Mb/s <--- Look Here! Should say 1000Mb/s!
-Duplex: Full
-Port: Twisted Pair
-PHYAD: 1
-Transceiver: internal
-Auto-negotiation: on
-MDI-X: Unknown
-Supports Wake-on: umbg
-Wake-on: g
-Current message level: 0x00000003 (3)
-Link detected: yes
- </screen>
- <para> In normal operation, the ICMP ping round trip time should be around 20ms, and the
- interface speed and duplex should read, "1000MB/s", and, "Full", respectively. </para>
- </section>
- </section>
- <section>
- <title>Resolution</title>
- <para> After determining that the active ethernet adapter was at the incorrect speed, we
- used the <code>ifenslave(8)</code> command to make the standby interface the active
- interface, which yielded an immediate improvement in MapReduce performance, and a 10 times
- improvement in network throughput: </para>
- <para> On the next trip to the datacenter, we determined that the line speed issue was
- ultimately caused by a bad network cable, which was replaced. </para>
- </section>
- </section>
- <!-- case study -->
- <section
- xml:id="casestudies.perf.1">
- <title>Case Study #2 (Performance Research 2012)</title>
- <para> Investigation results of a self-described "we're not sure what's wrong, but it seems
- slow" problem. <link
- xlink:href="http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html">http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html</link>
- </para>
- </section>
-
- <section
- xml:id="casestudies.perf.2">
- <title>Case Study #3 (Performance Research 2010))</title>
- <para> Investigation results of general cluster performance from 2010. Although this research
- is on an older version of the codebase, this writeup is still very useful in terms of
- approach. <link
- xlink:href="http://hstack.org/hbase-performance-testing/">http://hstack.org/hbase-performance-testing/</link>
- </para>
- </section>
-
- <section
- xml:id="casestudies.max.transfer.threads">
- <title>Case Study #4 (max.transfer.threads Config)</title>
- <para> Case study of configuring <code>max.transfer.threads</code> (previously known as
- <code>xcievers</code>) and diagnosing errors from misconfigurations. <link
- xlink:href="http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html">http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html</link>
- </para>
- <para> See also <xref
- linkend="dfs.datanode.max.transfer.threads" />. </para>
- </section>
-
- </section>
- <!-- performance/troubleshooting -->
-
-</chapter>
http://git-wip-us.apache.org/repos/asf/hbase/blob/fdd2692f/src/main/docbkx/community.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/community.xml b/src/main/docbkx/community.xml
deleted file mode 100644
index 813f356..0000000
--- a/src/main/docbkx/community.xml
+++ /dev/null
@@ -1,149 +0,0 @@
-<?xml version="1.0"?>
-<chapter
- xml:id="community"
- version="5.0"
- xmlns="http://docbook.org/ns/docbook"
- xmlns:xlink="http://www.w3.org/1999/xlink"
- xmlns:xi="http://www.w3.org/2001/XInclude"
- xmlns:svg="http://www.w3.org/2000/svg"
- xmlns:m="http://www.w3.org/1998/Math/MathML"
- xmlns:html="http://www.w3.org/1999/xhtml"
- xmlns:db="http://docbook.org/ns/docbook">
- <!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
- <title>Community</title>
- <section
- xml:id="decisions">
- <title>Decisions</title>
- <section
- xml:id="feature_branches">
- <title>Feature Branches</title>
- <para>Feature Branches are easy to make. You do not have to be a committer to make one. Just
- request the name of your branch be added to JIRA up on the developer's mailing list and a
- committer will add it for you. Thereafter you can file issues against your feature branch in
- Apache HBase JIRA. Your code you keep elsewhere -- it should be public so it can be observed
- -- and you can update dev mailing list on progress. When the feature is ready for commit, 3
- +1s from committers will get your feature merged. See <link
- xlink:href="http://search-hadoop.com/m/asM982C5FkS1">HBase, mail # dev - Thoughts
- about large feature dev branches</link></para>
- </section>
- <section
- xml:id="patchplusonepolicy">
- <title>Patch +1 Policy</title>
- <para> The below policy is something we put in place 09/2012. It is a suggested policy rather
- than a hard requirement. We want to try it first to see if it works before we cast it in
- stone. </para>
- <para> Apache HBase is made of <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel">components</link>.
- Components have one or more <xref
- linkend="OWNER" />s. See the 'Description' field on the <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel">components</link>
- JIRA page for who the current owners are by component. </para>
- <para> Patches that fit within the scope of a single Apache HBase component require, at least,
- a +1 by one of the component's owners before commit. If owners are absent -- busy or
- otherwise -- two +1s by non-owners will suffice. </para>
- <para> Patches that span components need at least two +1s before they can be committed,
- preferably +1s by owners of components touched by the x-component patch (TODO: This needs
- tightening up but I think fine for first pass). </para>
- <para> Any -1 on a patch by anyone vetos a patch; it cannot be committed until the
- justification for the -1 is addressed. </para>
- </section>
- <section
- xml:id="hbase.fix.version.in.JIRA">
- <title>How to set fix version in JIRA on issue resolve</title>
- <para>Here is how <link
- xlink:href="http://search-hadoop.com/m/azemIi5RCJ1">we agreed</link> to set versions in
- JIRA when we resolve an issue. If trunk is going to be 0.98.0 then: </para>
- <itemizedlist>
- <listitem>
- <para> Commit only to trunk: Mark with 0.98 </para>
- </listitem>
- <listitem>
- <para> Commit to 0.95 and trunk : Mark with 0.98, and 0.95.x </para>
- </listitem>
- <listitem>
- <para> Commit to 0.94.x and 0.95, and trunk: Mark with 0.98, 0.95.x, and 0.94.x </para>
- </listitem>
- <listitem>
- <para> Commit to 89-fb: Mark with 89-fb. </para>
- </listitem>
- <listitem>
- <para> Commit site fixes: no version </para>
- </listitem>
- </itemizedlist>
- </section>
- <section
- xml:id="hbase.when.to.close.JIRA">
- <title>Policy on when to set a RESOLVED JIRA as CLOSED</title>
- <para>We <link
- xlink:href="http://search-hadoop.com/m/4cIKs1iwXMS1">agreed</link> that for issues that
- list multiple releases in their <emphasis>Fix Version/s</emphasis> field, CLOSE the issue on
- the release of any of the versions listed; subsequent change to the issue must happen in a
- new JIRA. </para>
- </section>
- <section
- xml:id="no.permanent.state.in.zk">
- <title>Only transient state in ZooKeeper!</title>
- <para> You should be able to kill the data in zookeeper and hbase should ride over it
- recreating the zk content as it goes. This is an old adage around these parts. We just made
- note of it now. We also are currently in violation of this basic tenet -- replication at
- least keeps permanent state in zk -- but we are working to undo this breaking of a golden
- rule. </para>
- </section>
- </section>
- <section
- xml:id="community.roles">
- <title>Community Roles</title>
- <section
- xml:id="OWNER">
- <title>Component Owner/Lieutenant</title>
- <para> Component owners are listed in the description field on this Apache HBase JIRA <link
- xlink:href="https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel">components</link>
- page. The owners are listed in the 'Description' field rather than in the 'Component Lead'
- field because the latter only allows us list one individual whereas it is encouraged that
- components have multiple owners. </para>
- <para> Owners or component lieutenants are volunteers who are (usually, but not necessarily)
- expert in their component domain and may have an agenda on how they think their Apache HBase
- component should evolve. </para>
- <orderedlist>
- <title>Component Owner Duties</title>
- <listitem>
- <para> Owners will try and review patches that land within their component's scope.
- </para>
- </listitem>
- <listitem>
- <para> If applicable, if an owner has an agenda, they will publish their goals or the
- design toward which they are driving their component </para>
- </listitem>
- </orderedlist>
- <para> If you would like to be volunteer as a component owner, just write the dev list and
- we'll sign you up. Owners do not need to be committers. </para>
- </section>
- </section>
- <section
- xml:id="hbase.commit.msg.format">
- <title>Commit Message format</title>
- <para>We <link
- xlink:href="http://search-hadoop.com/m/Gwxwl10cFHa1">agreed</link> to the following SVN
- commit message format:
- <programlisting>HBASE-xxxxx <title>. (<contributor>)</programlisting> If the person
- making the commit is the contributor, leave off the '(<contributor>)' element. </para>
- </section>
-</chapter>