You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/09/17 03:17:28 UTC

[Lucene-hadoop Wiki] Update of "Hbase/PerformanceEvaluation" by stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation

The comment on the change is:
Add numbers for new test run

------------------------------------------------------------------------------
  
  == Content ==
   * [#description Tool Description]
-  * [#first_test First Evaluation of Region Server]
+  * [#first_test First Evaluation of Region Server] -- June 8th, 2007
+  * [#second_test Second Evaluation of Region Server] -- September 16th, 2007
  
  [[Anchor(description)]]
  == Tool Description ==
  
- [https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase {{{src/test}}} the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}}.  It runs the tests described in ''Performance Evaluation'', Section 7 of the [http://labs.google.com/papers/bigtable.html BigTable paper].  See the citation for test descriptions.  They will not be described below. The script is useful evaluating HBase performance and how well it scales as we add region servers.
+ [https://issues.apache.org/jira/browse/HADOOP-1476 HADOOP-1476] adds to HBase {{{src/test}}} the script {{{org.apache.hadoop.hbase.PerformanceEvaluation}}} (June 12th, 2007).  It runs the tests described in ''Performance Evaluation'', Section 7 of the [http://labs.google.com/papers/bigtable.html BigTable paper].  See the citation for test descriptions.  They will not be described below. The script is useful evaluating HBase performance and how well it scales as we add region servers.
  
  Here is the current usage for the {{{PerformanceEvaluation}}} script:
  
@@ -47, +48 @@

  $ ant compile-test
  }}}
  
- The above ant target compiles all test classes into {{{${HADOOP_HOME}/build/contrib/hbase/test}}}.  It also generates {{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}.  The latter jar includes all HBase test and src classes and has {{{org.apache.hadoop.hbase.PerformanceEvaluation}}} as its {{{Main-Class}}}.  Use the test jar running {{{PerformanceEvaluation}}} on a hadoop cluster.
+ The above ant target compiles all test classes into {{{${HADOOP_HOME}/build/contrib/hbase/test}}}.  It also generates {{{${HADOOP_HOME}/build/contrib/hbase/hadoop-hbase-test.jar}}}.  The latter jar includes all HBase test and src classes and has {{{org.apache.hadoop.hbase.PerformanceEvaluation}}} as its {{{Main-Class}}}.  Use the test jar running {{{PerformanceEvaluation}}} on a hadoop cluster (You'd run the client as a MR job when you want to run multiple clients concurrently).
  
  Here is how to run a single-client {{{PerformanceEvaluation}}} ''sequentialWrite'' test:
  
@@ -61, +62 @@

  
  For the latter, you will likely have to copy your hbase configurations -- e.g. your {{{${HBASE_HOME}/conf/hbase*.xml}}} files -- to {{{${HADOOP_HOME}/conf}}} and make sure they are replicated across the cluster so your hbase configurations can be found by the running mapreduce job (in particular, clients need to know the address of the HBase master).
  
- Note, the mapreduce mode of the testing script works a little different from single client mode.  It does not delete the test table at the end of each run as is done when the script runs in single client mode.  Nor does it pre-run the '''sequentialWrite''' test before its runs the '''sequentialRead''' test (the table needs to be populated with data first before the sequentialRead can run).  For the mapreduce version, the onus is on the operator to organize the correct order in which to run the jobs.  To delete a table, use the hbase client.
+ Note, the mapreduce mode of the testing script works a little different from single client mode.  It does not delete the test table at the end of each run as is done when the script runs in single client mode.  Nor does it pre-run the '''sequentialWrite''' test before its runs the '''sequentialRead''' test (the table needs to be populated with data first before the sequentialRead can run).  For the mapreduce version, the onus is on the operator to organize the correct order in which to run the jobs.  To delete a table, use the hbase shell and run the drop table command (Run 'help;' for how after starting the shell).
  
  
- {{{$ ${HBASE_HOME}/bin/hbase ciient listTables
+ {{{$ ${HBASE_HOME}/bin/hbase shell
- $ ${HBASE_HOME}/bin/hbase ciient deleteTable TestTable
  }}}
  
  
@@ -90, +90 @@

  
  More to follow after more analysis.
  
+ [[Anchor(second_test)]]
+ == One Region Server on September 16th, 2007 ==
+ Ran same setup as for the first test above on same machines. The main performance improvement in hbase is that batch updates are only sent to the server by the client on commit where before each batch operation -- start, put, commit -- required a trip to the server.  This change cuts the number of trips to the server by 2/3rds at least.  Otherwise, the client/server communication has changed where it makes sense to pass bytes rather than an object wrapping bytes for some savings RPCing.
+ 
+ Here is the loading command run:
+ {{{$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 1
+ }}}
+ 
+ 
+ ||Experiment||HBase20070708||HBase20070916||!BigTable||
+ ||random reads ||68||272||1212||
+ ||random reads (mem)||Not implemented||Not implemented||10811||
+ ||random writes||847||1460||8850||
+ ||sequential reads||301||267||4425||
+ ||sequential writes||850||1278||8547||
+ ||scans||3063||3692||15385||
+ 
+ The above table lists how many 1000-byte rows read/written per second.
+ 
+ Random reads are almost 4x faster, random and sequential writes ~50% faster, and scans about ~20% faster but still a long ways to go...
+