You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@gora.apache.org by bu...@apache.org on 2014/09/29 02:59:40 UTC

svn commit: r923975 - in /websites/staging/gora/trunk/content: ./ current/index.html

Author: buildbot
Date: Mon Sep 29 00:59:39 2014
New Revision: 923975

Log:
Staging update by buildbot for gora

Modified:
    websites/staging/gora/trunk/content/   (props changed)
    websites/staging/gora/trunk/content/current/index.html

Propchange: websites/staging/gora/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Sep 29 00:59:39 2014
@@ -1 +1 @@
-1628109
+1628110

Modified: websites/staging/gora/trunk/content/current/index.html
==============================================================================
--- websites/staging/gora/trunk/content/current/index.html (original)
+++ websites/staging/gora/trunk/content/current/index.html Mon Sep 29 00:59:39 2014
@@ -169,13 +169,13 @@ under the License. 
 <li><a href="#building-goraci">Building GoraCI</a></li>
 <li><a href="#java-class-description">Java Class Description</a></li>
 <li><a href="#gora-and-hadoop">Gora and Hadoop</a></li>
+<li><a href="#goraci-and-hbase">GoraCI and HBase</a></li>
+<li><a href="#concurrency">Concurrency</a></li>
+<li><a href="#conclusions">Conclusions</a></li>
 </ul>
 </li>
 </ul>
 </li>
-<li><a href="#goraci-and-hbase">GORACI AND HBASE</a></li>
-<li><a href="#concurrency">CONCURRENCY</a></li>
-<li><a href="#conclusions">CONCLUSIONS</a></li>
 </ul>
 </div>
 <p>This is the main entry point for Gora documentation. Here are some pointers for further info:</p>
@@ -282,18 +282,18 @@ GoraCI against different Gora backends b
 Before packaging its important to edit <code>gora.properties</code> and set it correctly
 for your datastore.  To run against Accumulo do the following.</p>
 <p><code>
-  vim src/main/resources/gora.properties //set Accumulo properties
-  mvn package -Paccumulo-1.4
+  vim src/main/resources/gora.properties //set Accumulo properties</p>
+<p>mvn package -Paccumulo-1.4
 </code></p>
 <p>To run against HBase, do the following.</p>
 <p><code>
-  vim src/main/resources/gora.properties //set HBase properties
-  mvn package -Phbase-0.92
+  vim src/main/resources/gora.properties //set HBase properties</p>
+<p>mvn package -Phbase-0.92
 </code></p>
 <p>To run against Cassandra, do the following.</p>
 <p><code>
-  vim src/main/resources/gora.properties //set Cassandra properties
-  mvn package -Pcassandra-1.1.2
+  vim src/main/resources/gora.properties //set Cassandra properties</p>
+<p>mvn package -Pcassandra-1.1.2
 </code></p>
 <p>For other datastores mentioned in <code>gora.properties</code>, you will need to copy the
 appropriate deps into <code>lib</code>.  Feel free to update the pom with other profiles, <a href="https://issues.apache.org/jira/browse/GORA/">open
@@ -317,8 +317,8 @@ a ticket</a> or just <a href="https://gi
 assumes all needed jars are in the <code>lib</code> dir.  It does not need the package name.
 You can just run <code>goraci.sh Generator</code>, below is an example.</p>
 <p><code>
-  $ ./goraci.sh Generator
-  Usage : Generator <num mappers> <num nodes>
+  $ ./goraci.sh Generator</p>
+<p>Usage : Generator <num mappers> <num nodes>
 </code></p>
 <p>For Gora to work, it needs a <code>gora.properties</code> file on the classpath and a
 <code>gora-$datastore-mapping.xml</code> mapping file on the classpath, the contents of both are datastore specific,
@@ -331,19 +331,19 @@ The two libraries  jackson-core and jack
 <code>$HADOOP_HOME/lib</code> and <code>$HADOOP_HOME/share/hadoop/lib/</code>.  Currently these are updated to
 jackson-core-asl-1.4.2.jar and jackson-mapper-asl-1.4.2.jar.  For details see
 <a href="https://issues.apache.org/jira/browse/HADOOP-6945">HADOOP-6945</a>. </p>
-<h2 id="goraci-and-hbase">GORACI AND HBASE</h2>
+<h4 id="goraci-and-hbase">GoraCI and HBase</h4>
 <p>To improve performance running read jobs such as the Verify step, enable
 scanner caching on the command line.  For example:</p>
-<div class="codehilite"><pre>$ <span class="o">./</span><span class="n">gorachi</span><span class="p">.</span><span class="n">sh</span> <span class="n">Verify</span> <span class="o">-</span><span class="n">Dhbase</span><span class="p">.</span><span class="n">client</span><span class="p">.</span><span class="n">scanner</span><span class="p">.</span><span class="n">caching</span><span class="p">=</span>1000 <span class="o">\</span>
-     <span class="o">-</span><span class="n">Dmapred</span><span class="p">.</span><span class="n">map</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="n">speculative</span><span class="p">.</span><span class="n">execution</span><span class="p">=</span><span class="n">false</span> <span class="n">verify_dir</span> 1000
-</pre></div>
-
-
-<p>Dependent on how you have your hadoop and hbase deployed, you may need to
-change the gorachi.sh script around some.  Here is one suggestion that may help
-in the case where your hadoop and hbase configuration are other than under the
-hadoop and hbase home directories.</p>
-<p>diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
+<p><code>
+    $ ./gorachi.sh Verify -Dhbase.client.scanner.caching=1000 \
+         -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
+</code></p>
+<p>Dependent on how you have your Hadoop and HBase setup deployed, you may need to
+change the <code>gorachi.sh</code> script around some.  Here is one suggestion that may help
+in the case where your Hadoop and HBase configuration are other than under the
+Hadoop and HBase home directories.</p>
+<p><code>
+  diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
   index db1562a..31c3c94 100755
   --- a/org.apache.gora.goraci.sh
   +++ b/org.apache.gora.goraci.sh
@@ -354,78 +354,88 @@ hadoop and hbase home directories.</p>
   -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
   -
   -
-  +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"</p>
-<p>You will need to define HBASE_CONF_DIR and HADOOP_CONF_DIR before you run your
-org.apache.gora.goraci jobs.  For example:</p>
-<p>$ export HADOOP_CONF_DIR=/home/you/hadoop-conf
-  $ export HBASE_CONF_DIR=/home/you/hbase-conf
-  $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./org.apache.gora.goraci.sh Generator 1000 1000000</p>
-<h2 id="concurrency">CONCURRENCY</h2>
+  +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
+</code></p>
+<p>You will need to define <code>HBASE_CONF_DIR</code> and </code>HADOOP_CONF_DIR</code> before you run your
+<strong>goraci</strong> jobs.  For example:</p>
+<p><code>
+  $ export HADOOP_CONF_DIR=/home/you/hadoop-conf</p>
+<p>$ export HBASE_CONF_DIR=/home/you/hbase-conf</p>
+<p>$ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
+</code></p>
+<h4 id="concurrency">Concurrency</h4>
 <p>Its possible to run verification at the same time as generation.  To do this
 supply the -c option to Generator and Verify.  This will cause Genertor to
 create a secondary table which holds information about what verification can
-safely verify.  Running Verify with the -c option will make it run slower
+safely verify.  Running Verify with the <strong>-c</strong> option will make it run slower
 because more information must be brought back to the client side for filtering
 purposes.  The Loop program also supports the -c option, which will cause it to
 run verification concurrently with generation.</p>
-<p>If verification is run at the same time as generation without the -c option,
+<p>If verification is run at the same time as generation without the <strong>-c</strong> option,
 then it will inevitably fail.  This is because verification mappers read
 different parts of the table at different times and giving an inconsistent view
 of the table.  So one mapper may read a part of a table before a node is
 written, when the node is later referenced it will appear to be missing.  The
--c option basically filters out newer information using data written to the
+<strong>-c</strong> option basically filters out newer information using data written to the
 secondary table.</p>
-<h2 id="conclusions">CONCLUSIONS</h2>
+<h4 id="conclusions">Conclusions</h4>
 <p>This test suite does not do everything that the Accumulo test suite does,
 mainly it does not collect statistics and generate reports.  The reports
 are useful for assesing performance.</p>
 <p>Below shows running a test of the test.  Ingest one linked list, deleted a node
 in it, ensure the verifaction map reduce job notices that the node is missing.
 Not all output is shown, just the important parts.</p>
-<p>$ ./org.apache.gora.goraci.sh Generator  1 25000000
-  $ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1
-  2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-  $ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1
-  30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-  $ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7
-  Delete returned true
-  $ ./org.apache.gora.goraci.sh Verify gci_verify_1 2 
-  11/12/20 17:12:31 INFO mapred.JobClient:   org.apache.gora.goraci.Verify$Counts
-  11/12/20 17:12:31 INFO mapred.JobClient:     UNDEFINED=1
-  11/12/20 17:12:31 INFO mapred.JobClient:     REFERENCED=24999998
-  11/12/20 17:12:31 INFO mapred.JobClient:     UNREFERENCED=1
-  $ hadoop fs -cat gci_verify_1/part*
-  30350f9ae6f6e8f7  2000001f65dbd238</p>
+<p><code>
+  $ ./org.apache.gora.goraci.sh Generator  1 25000000</p>
+<p>$ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1</p>
+<p>2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6</p>
+<p>$ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1</p>
+<p>30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6</p>
+<p>$ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7</p>
+<p>Delete returned true</p>
+<p>$ ./org.apache.gora.goraci.sh Verify gci_verify_1 2 </p>
+<p>11/12/20 17:12:31 INFO mapred.JobClient:   org.apache.gora.goraci.Verify$Counts</p>
+<p>11/12/20 17:12:31 INFO mapred.JobClient:     UNDEFINED=1</p>
+<p>11/12/20 17:12:31 INFO mapred.JobClient:     REFERENCED=24999998</p>
+<p>11/12/20 17:12:31 INFO mapred.JobClient:     UNREFERENCED=1</p>
+<p>$ hadoop fs -cat gci_verify_1/part* 30350f9ae6f6e8f7 2000001f65dbd238
+</code></p>
 <p>The map reduce job found the one undefined node and gave the node that
 referenced it.</p>
-<p>Below are some timing statistics for running org.apache.gora.goraci on a 10 node cluster. </p>
-<p>Store           | Task                   | Time    | Undef  | Unref | Ref      <br />
+<p>Below are some timing statistics for running Goraci on a 10 node cluster. </p>
+<p><code>
+  Store           | Task                   | Time    | Undef  | Unref | Ref      <br />
   ----------------+------------------------+---------+--------+-------+------------
   accumulo-1.4.0  | Generator 10 100000000 | 40m 16s |    N/A |   N/A |        N/A   <br />
   accumulo-1.4.0  | Verify /tmp/goraci1 40 |  6m  7s |      0 |     0 | 1000000000<br />
   hbase-0.92.1    | Generator 10 100000000 |  2h 44m |    N/A |   N/A |        N/A   <br />
-  hbase-0.92.1    | Verify /tmp/goraci2 40 |  6m 34s |      0 |     0 | 1000000000</p>
-<p>Hbase and Accumulo are configured differently out-of-the-box.  We used the Accumulo 
-3G, native configuration examples in the conf/examples directory.</p>
+  hbase-0.92.1    | Verify /tmp/goraci2 40 |  6m 34s |      0 |     0 | 1000000000
+</code></p>
+<p>HBase and Accumulo are configured differently out-of-the-box.  We used the Accumulo 
+3G, native configuration examples in the <a href="https://github.com/apache/gora/tree/master/gora-goraci/src/main/resources">conf/examples</a> directory.</p>
 <p>To provide a comparable memory footprint, we increased the HBase jvm to "-Xmx4000m", 
 and turned on compression for the ci table:</p>
-<p>create 'ci', {NAME=&gt;'meta', COMPRESSION=&gt;'GZ'}</p>
+<p><code>
+create 'ci', {NAME=&gt;'meta', COMPRESSION=&gt;'GZ'}
+</code></p>
 <p>We also turned down the replication of write-ahead logs to be comparable to Accumulo:</p>
-<p><property>
-    <name>hbase.regionserver.hlog.replication</name>
-    <value>2</value>
-  </property></p>
+<div class="codehilite"><pre><span class="nt">&lt;property&gt;</span>
+  <span class="nt">&lt;name&gt;</span>hbase.regionserver.hlog.replication<span class="nt">&lt;/name&gt;</span>
+  <span class="nt">&lt;value&gt;</span>2<span class="nt">&lt;/value&gt;</span>
+<span class="nt">&lt;/property&gt;</span>
+</pre></div>
+
+
 <p>For the accumulo run, we set the split threshold to 512M:</p>
-<p>shell&gt; config -t ci -s table.split.threshold=512M</p>
+<div class="codehilite"><pre><span class="n">shell</span><span class="o">&gt;</span> <span class="n">config</span> <span class="o">-</span><span class="n">t</span> <span class="n">ci</span> <span class="o">-</span><span class="n">s</span> <span class="n">table</span><span class="p">.</span><span class="n">split</span><span class="p">.</span><span class="n">threshold</span><span class="p">=</span>512<span class="n">M</span>
+</pre></div>
+
+
 <p>This was done so that Accumulo would end up with 64 tablets, which is the
-number of regions hbase had.   The number of tablets/regions determines how
+number of regions HBase had. The number of tablets/regions determines how
 much parallelism there is in the map phase of the verify step.</p>
 <p>Sometimes when this test suite is run against HBase data is lost.  This issue
-is being tracked under HBASE-5754 [4].</p>
-<p>[0] http://accumulo.apache.org
-[1] http://gora.apache.org
-[2] http://gora.apache.org/docs/current/gora-conf.html</p>
-<p>[4] https://issues.apache.org/jira/browse/HBASE-5754</p>
+is being tracked under <a href="https://issues.apache.org/jira/browse/HBASE-5754">HBASE-5754</a></p>
 
   </div> <!-- /container (main block) -->