You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@gora.apache.org by le...@apache.org on 2014/09/29 03:07:51 UTC
svn commit: r1628111 - /gora/site/trunk/content/current/index.md
Author: lewismc
Date: Mon Sep 29 01:07:51 2014
New Revision: 1628111
URL: http://svn.apache.org/r1628111
Log:
Update GoraCI documentation
Modified:
gora/site/trunk/content/current/index.md
Modified: gora/site/trunk/content/current/index.md
URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1628111&r1=1628110&r2=1628111&view=diff
==============================================================================
--- gora/site/trunk/content/current/index.md (original)
+++ gora/site/trunk/content/current/index.md Mon Sep 29 01:07:51 2014
@@ -120,34 +120,25 @@ detected.
As GoraCI is packaged with the Gora master branch source it is automatically
built every time you execute
-<code>mvn install</code>
+ mvn install
The maven pom file has some profiles that attempt to make it easier to run
GoraCI against different Gora backends by copying the jars you need into <code>lib</code>.
Before packaging its important to edit <code>gora.properties</code> and set it correctly
for your datastore. To run against Accumulo do the following.
-<code>
- vim src/main/resources/gora.properties //set Accumulo properties
-
- mvn package -Paccumulo-1.4
-</code>
+ vim src/main/resources/gora.properties //set Accumulo properties
+ mvn package -Paccumulo-1.4
To run against HBase, do the following.
-<code>
- vim src/main/resources/gora.properties //set HBase properties
-
- mvn package -Phbase-0.92
-</code>
+ vim src/main/resources/gora.properties //set HBase properties
+ mvn package -Phbase-0.92
To run against Cassandra, do the following.
-<code>
- vim src/main/resources/gora.properties //set Cassandra properties
-
- mvn package -Pcassandra-1.1.2
-</code>
+ vim src/main/resources/gora.properties //set Cassandra properties
+ mvn package -Pcassandra-1.1.2
For other datastores mentioned in <code>gora.properties</code>, you will need to copy the
appropriate deps into <code>lib</code>. Feel free to update the pom with other profiles, [open
@@ -173,11 +164,8 @@ Below is a description of the Java progr
assumes all needed jars are in the <code>lib</code> dir. It does not need the package name.
You can just run <code>goraci.sh Generator</code>, below is an example.
-<code>
- $ ./goraci.sh Generator
-
- Usage : Generator <num mappers> <num nodes>
-</code>
+ $ ./goraci.sh Generator
+ Usage : Generator <num mappers> <num nodes>
For Gora to work, it needs a <code>gora.properties</code> file on the classpath and a
<code>gora-$datastore-mapping.xml</code> mapping file on the classpath, the contents of both are datastore specific,
@@ -186,7 +174,6 @@ and build the <code>goraci-${version}-SN
those and put them on the classpath through some other means.
####Gora and Hadoop
-
Gora uses [Apache Avro](http://avro.apache.org) which uses a Json library that Hadoop has an old version of.
The two libraries jackson-core and jackson-mapper need to be updated in
<code>$HADOOP_HOME/lib</code> and <code>$HADOOP_HOME/share/hadoop/lib/</code>. Currently these are updated to
@@ -194,45 +181,36 @@ jackson-core-asl-1.4.2.jar and jackson-m
[HADOOP-6945](https://issues.apache.org/jira/browse/HADOOP-6945).
####GoraCI and HBase
-
To improve performance running read jobs such as the Verify step, enable
scanner caching on the command line. For example:
-<code>
$ ./gorachi.sh Verify -Dhbase.client.scanner.caching=1000 \
- -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
-</code>
+ -Dmapred.map.tasks.speculative.execution=false verify_dir 1000
Dependent on how you have your Hadoop and HBase setup deployed, you may need to
change the <code>gorachi.sh</code> script around some. Here is one suggestion that may help
in the case where your Hadoop and HBase configuration are other than under the
Hadoop and HBase home directories.
-<code>
- diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
- index db1562a..31c3c94 100755
- --- a/org.apache.gora.goraci.sh
- +++ b/org.apache.gora.goraci.sh
- @@ -95,6 +95,4 @@ done
- #run it
- export HADOOP_CLASSPATH="$CLASSPATH"
- LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
- -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
- -
- -
- +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
-</code>
+ diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh
+ index db1562a..31c3c94 100755
+ --- a/org.apache.gora.goraci.sh
+ +++ b/org.apache.gora.goraci.sh
+ @@ -95,6 +95,4 @@ done
+ #run it
+ export HADOOP_CLASSPATH="$CLASSPATH"
+ LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
+ -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
+ -
+ -
+ +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@"
You will need to define <code>HBASE_CONF_DIR</code> and </code>HADOOP_CONF_DIR</code> before you run your
**goraci** jobs. For example:
-<code>
- $ export HADOOP_CONF_DIR=/home/you/hadoop-conf
-
- $ export HBASE_CONF_DIR=/home/you/hbase-conf
-
- $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
-</code>
+ $ export HADOOP_CONF_DIR=/home/you/hadoop-conf
+ $ export HBASE_CONF_DIR=/home/you/hbase-conf
+ $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000
####Concurrency
@@ -262,47 +240,31 @@ Below shows running a test of the test.
in it, ensure the verifaction map reduce job notices that the node is missing.
Not all output is shown, just the important parts.
-<code>
- $ ./org.apache.gora.goraci.sh Generator 1 25000000
-
- $ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1
-
- 2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-
- $ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1
-
- 30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
-
- $ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7
-
- Delete returned true
-
- $ ./org.apache.gora.goraci.sh Verify gci_verify_1 2
-
- 11/12/20 17:12:31 INFO mapred.JobClient: org.apache.gora.goraci.Verify$Counts
-
- 11/12/20 17:12:31 INFO mapred.JobClient: UNDEFINED=1
-
- 11/12/20 17:12:31 INFO mapred.JobClient: REFERENCED=24999998
-
- 11/12/20 17:12:31 INFO mapred.JobClient: UNREFERENCED=1
-
- $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7 2000001f65dbd238
-</code>
+ $ ./goraci.sh Generator 1 25000000
+ $ ./goraci.sh Print -s 2000000000000000 -l 1
+ 2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
+ $ ./goraci.sh Print -s 30350f9ae6f6e8f7 -l 1
+ 30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6
+ $ ./goraci.sh Delete 30350f9ae6f6e8f7
+ Delete returned true
+ $ ./goraci.sh Verify gci_verify_1 2
+ 11/12/20 17:12:31 INFO mapred.JobClient: org.apache.gora.goraci.Verify$Counts
+ 11/12/20 17:12:31 INFO mapred.JobClient: UNDEFINED=1
+ 11/12/20 17:12:31 INFO mapred.JobClient: REFERENCED=24999998
+ 11/12/20 17:12:31 INFO mapred.JobClient: UNREFERENCED=1
+ $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7 2000001f65dbd238
The map reduce job found the one undefined node and gave the node that
referenced it.
Below are some timing statistics for running Goraci on a 10 node cluster.
-<code>
- Store | Task | Time | Undef | Unref | Ref
- ----------------+------------------------+---------+--------+-------+------------
- accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A | N/A
- accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 | 1000000000
- hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A | N/A
- hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 | 1000000000
-</code>
+ Store | Task | Time | Undef | Unref | Ref
+ ----------------+------------------------+---------+--------+-------+------------
+ accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A | N/A
+ accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 | 1000000000
+ hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A | N/A
+ hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 | 1000000000
HBase and Accumulo are configured differently out-of-the-box. We used the Accumulo
3G, native configuration examples in the [conf/examples](https://github.com/apache/gora/tree/master/gora-goraci/src/main/resources) directory.
@@ -310,9 +272,7 @@ HBase and Accumulo are configured differ
To provide a comparable memory footprint, we increased the HBase jvm to "-Xmx4000m",
and turned on compression for the ci table:
-<code>
-create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'}
-</code>
+ create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'}
We also turned down the replication of write-ahead logs to be comparable to Accumulo: