You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/08/24 23:05:39 UTC
svn commit: r1161273 - /hbase/trunk/src/docbkx/performance.xml

Author: dmeil
Date: Wed Aug 24 21:05:38 2011
New Revision: 1161273

URL: http://svn.apache.org/viewvc?rev=1161273&view=rev
Log:
HBASE-4249 - performance.xml (adding network section)

Modified:
    hbase/trunk/src/docbkx/performance.xml

Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1161273&r1=1161272&r2=1161273&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Wed Aug 24 21:05:38 2011
@@ -24,7 +24,59 @@
           <para>Watch out for swapping.  Set swappiness to 0.</para>
         </section>
   </section>
-
+  <section xml:id="perf.network">
+    <title>Network</title>
+    <para>
+    Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware
+    that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). 
+    </para>
+    <para>
+    Important items to consider:
+        <itemizedlist>
+          <listitem>Switching capacity of the device</listitem>
+          <listitem>Number of systems connected</listitem>
+          <listitem>Uplink capacity</listitem>
+        </itemizedlist>
+    </para>
+    <section xml:id="perf.network.1switch">
+      <title>Single Switch</title>
+      <para>The single most important factor in this configuration is that the switching capacity of the hardware is capable of 
+      handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware
+      can have a slower switching capacity than could be utilized by a full switch. 
+      </para>
+    </section>
+    <section xml:id="perf.network.2switch">
+      <title>Multiple Switches</title>
+      <para>Multiple switches are a potential pitfall in the architecture.   The most common configuration of lower priced hardware is a
+      simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. 
+      Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated.
+      </para>
+      <para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
+      <itemizedlist>
+        <listitem>Use appropriate hardware for the scale of the cluster which you're attempting to build.</listitem>
+        <listitem>Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port</listitem>
+        <listitem>Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.</listitem>
+      </itemizedlist>
+      </para>
+    </section>
+    <section xml:id="perf.network.multirack">
+      <title>Multiple Racks</title>
+      <para>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:
+         <itemizedlist>
+           <listitem>Poor switch capacity performance</listitem>
+           <listitem>Insufficient uplink to another rack</listitem>
+         </itemizedlist>
+      If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing 
+      more of your cluster across racks.  The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
+      The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack
+      A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. 
+      </para>
+      <para>Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to
+      save your ports for machines as opposed to uplinks.
+      </para>
+      
+    </section>
+  </section>  <!-- network -->
   <section xml:id="jvm">
     <title>Java</title>
 
@@ -56,7 +108,7 @@
   </section>
 
   <section xml:id="perf.configurations">
-    <title>Configurations</title>
+    <title>HBase Configurations</title>
 
     <para>See <xref linkend="recommended_configurations" />.</para>