You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@gora.apache.org by le...@apache.org on 2014/09/29 03:07:51 UTC

svn commit: r1628111 - /gora/site/trunk/content/current/index.md

Author: lewismc
Date: Mon Sep 29 01:07:51 2014
New Revision: 1628111

URL: http://svn.apache.org/r1628111
Log:
Update GoraCI documentation

Modified:
    gora/site/trunk/content/current/index.md

Modified: gora/site/trunk/content/current/index.md
URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1628111&r1=1628110&r2=1628111&view=diff
==============================================================================
--- gora/site/trunk/content/current/index.md (original)
+++ gora/site/trunk/content/current/index.md Mon Sep 29 01:07:51 2014
@@ -120,34 +120,25 @@ detected.
 As GoraCI is packaged with the Gora master branch source it is automatically 
 built every time you execute
 
-<code>mvn install</code>
+    mvn install
 
 The maven pom file has some profiles that attempt to make it easier to run
 GoraCI against different Gora backends by copying the jars you need into <code>lib</code>.
 Before packaging its important to edit <code>gora.properties</code> and set it correctly
 for your datastore.  To run against Accumulo do the following.
 
-<code>
-  vim src/main/resources/gora.properties //set Accumulo properties
-
-  mvn package -Paccumulo-1.4
-</code>
+    vim src/main/resources/gora.properties //set Accumulo properties
+    mvn package -Paccumulo-1.4
 
 To run against HBase, do the following.
 
-<code>
-  vim src/main/resources/gora.properties //set HBase properties
-
-  mvn package -Phbase-0.92
-</code>
+    vim src/main/resources/gora.properties //set HBase properties
+    mvn package -Phbase-0.92
 
 To run against Cassandra, do the following.
 
-<code>
-  vim src/main/resources/gora.properties //set Cassandra properties
-
-  mvn package -Pcassandra-1.1.2
-</code>
+    vim src/main/resources/gora.properties //set Cassandra properties
+    mvn package -Pcassandra-1.1.2
 
 For other datastores mentioned in <code>gora.properties</code>, you will need to copy the
 appropriate deps into <code>lib</code>.  Feel free to update the pom with other profiles, [open
@@ -173,11 +164,8 @@ Below is a description of the Java progr
 assumes all needed jars are in the <code>lib</code> dir.  It does not need the package name.
 You can just run <code>goraci.sh Generator</code>, below is an example.
 
-<code>
-  $ ./goraci.sh Generator
-
-  Usage : Generator <num mappers> <num nodes>
-</code>
+    $ ./goraci.sh Generator
+    Usage : Generator <num mappers> <num nodes>
 
 For Gora to work, it needs a <code>gora.properties</code> file on the classpath and a
 <code>gora-$datastore-mapping.xml</code> mapping file on the classpath, the contents of both are datastore specific,
@@ -186,7 +174,6 @@ and build the <code>goraci-${version}-SN
 those and put them on the classpath through some other means.
 
 ####Gora and Hadoop
-
 Gora uses [Apache Avro](http://avro.apache.org) which uses a Json library that Hadoop has an old version of.
 The two libraries  jackson-core and jackson-mapper need to be updated in
 <code>$HADOOP_HOME/lib</code> and <code>$HADOOP_HOME/share/hadoop/lib/</code>.  Currently these are updated to
@@ -194,45 +181,36 @@ jackson-core-asl-1.4.2.jar and jackson-m
 [HADOOP-6945](https://issues.apache.org/jira/browse/HADOOP-6945). 
 
 ####GoraCI and HBase
-
 To improve performance running read jobs such as the Verify step, enable
 scanner caching on the command line.  For example:
 
-<code>
     $ ./gorachi.sh Verify -Dhbase.client.scanner.caching=1000 \
-         -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
-</code>
+       -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
 
 Dependent on how you have your Hadoop and HBase setup deployed, you may need to
 change the <code>gorachi.sh</code> script around some.  Here is one suggestion that may help
 in the case where your Hadoop and HBase configuration are other than under the
 Hadoop and HBase home directories.
 
-<code>
-  diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
-  index db1562a..31c3c94 100755
-  --- a/org.apache.gora.goraci.sh
-  +++ b/org.apache.gora.goraci.sh
-  @@ -95,6 +95,4 @@ done
-   #run it
-   export HADOOP_CLASSPATH="$CLASSPATH"
-   LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
-  -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
-  -
-  -
-  +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
-</code>
+    diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
+    index db1562a..31c3c94 100755
+    --- a/org.apache.gora.goraci.sh
+    +++ b/org.apache.gora.goraci.sh
+    @@ -95,6 +95,4 @@ done
+     #run it
+     export HADOOP_CLASSPATH="$CLASSPATH"
+     LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
+     -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
+     -
+     -
+     +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
 
 You will need to define <code>HBASE_CONF_DIR</code> and </code>HADOOP_CONF_DIR</code> before you run your
 **goraci** jobs.  For example:
 
-<code>
-  $ export HADOOP_CONF_DIR=/home/you/hadoop-conf
-
-  $ export HBASE_CONF_DIR=/home/you/hbase-conf
-
-  $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
-</code>
+    $ export HADOOP_CONF_DIR=/home/you/hadoop-conf
+    $ export HBASE_CONF_DIR=/home/you/hbase-conf
+    $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
 
 ####Concurrency
 
@@ -262,47 +240,31 @@ Below shows running a test of the test. 
 in it, ensure the verifaction map reduce job notices that the node is missing.
 Not all output is shown, just the important parts.
 
-<code>
-  $ ./org.apache.gora.goraci.sh Generator  1 25000000
-
-  $ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1
-
-  2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-
-  $ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1
-
-  30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-
-  $ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7
-
-  Delete returned true
-
-  $ ./org.apache.gora.goraci.sh Verify gci_verify_1 2 
-
-  11/12/20 17:12:31 INFO mapred.JobClient:   org.apache.gora.goraci.Verify$Counts
-
-  11/12/20 17:12:31 INFO mapred.JobClient:     UNDEFINED=1
-
-  11/12/20 17:12:31 INFO mapred.JobClient:     REFERENCED=24999998
-
-  11/12/20 17:12:31 INFO mapred.JobClient:     UNREFERENCED=1
-
-  $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7	2000001f65dbd238
-</code>
+    $ ./goraci.sh Generator  1 25000000
+    $ ./goraci.sh Print -s 2000000000000000 -l 1
+      2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
+    $ ./goraci.sh Print -s 30350f9ae6f6e8f7 -l 1
+      30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
+    $ ./goraci.sh Delete 30350f9ae6f6e8f7
+      Delete returned true
+    $ ./goraci.sh Verify gci_verify_1 2 
+      11/12/20 17:12:31 INFO mapred.JobClient:   org.apache.gora.goraci.Verify$Counts
+      11/12/20 17:12:31 INFO mapred.JobClient:     UNDEFINED=1
+      11/12/20 17:12:31 INFO mapred.JobClient:     REFERENCED=24999998
+      11/12/20 17:12:31 INFO mapred.JobClient:     UNREFERENCED=1
+    $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7	2000001f65dbd238
 
 The map reduce job found the one undefined node and gave the node that
 referenced it.
 
 Below are some timing statistics for running Goraci on a 10 node cluster. 
 
-<code>
-  Store           | Task                   | Time    | Undef  | Unref | Ref        
-  ----------------+------------------------+---------+--------+-------+------------
-  accumulo-1.4.0  | Generator 10 100000000 | 40m 16s |    N/A |   N/A |        N/A     
-  accumulo-1.4.0  | Verify /tmp/goraci1 40 |  6m  7s |      0 |     0 | 1000000000  
-  hbase-0.92.1    | Generator 10 100000000 |  2h 44m |    N/A |   N/A |        N/A     
-  hbase-0.92.1    | Verify /tmp/goraci2 40 |  6m 34s |      0 |     0 | 1000000000
-</code>
+    Store           | Task                   | Time    | Undef  | Unref | Ref        
+    ----------------+------------------------+---------+--------+-------+------------
+    accumulo-1.4.0  | Generator 10 100000000 | 40m 16s |    N/A |   N/A |        N/A     
+    accumulo-1.4.0  | Verify /tmp/goraci1 40 |  6m  7s |      0 |     0 | 1000000000  
+    hbase-0.92.1    | Generator 10 100000000 |  2h 44m |    N/A |   N/A |        N/A     
+    hbase-0.92.1    | Verify /tmp/goraci2 40 |  6m 34s |      0 |     0 | 1000000000
 
 HBase and Accumulo are configured differently out-of-the-box.  We used the Accumulo 
 3G, native configuration examples in the [conf/examples](https://github.com/apache/gora/tree/master/gora-goraci/src/main/resources) directory.
@@ -310,9 +272,7 @@ HBase and Accumulo are configured differ
 To provide a comparable memory footprint, we increased the HBase jvm to "-Xmx4000m", 
 and turned on compression for the ci table:
 
-<code>
-create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'}
-</code>
+    create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'}
 
 We also turned down the replication of write-ahead logs to be comparable to Accumulo: