You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/09/11 13:04:45 UTC
[GitHub] [accumulo-website] keith-turner commented on a change in pull request #192: Add blog post about storing Accumulo data in S3

keith-turner commented on a change in pull request #192: Add blog post about storing Accumulo data in S3
URL: https://github.com/apache/accumulo-website/pull/192#discussion_r323228128
 
 

 ##########
 File path: _posts/blog/2019-09-10-accumulo-S3-notes.md
 ##########
 @@ -0,0 +1,145 @@
+---
+title: "Using S3 as a data store for Accumulo"
+author: Keith Turner
+---
+
+Accumulo can store its files in S3, however S3 does not support the needs of
+write ahead logs and the Accumulo metadata table. One way to solve this problem
+is to store the metadata table and write ahead logs in HDFS and everything else
+in S3.  This post shows how to do that using Accumulo 2.0 and Hadoop 3.2.0.
+Running on S3 requires a new feature in Accumulo 2.0, that volume choosers are
+aware of write ahead logs.
+
+## Hadoop setup
+
+At least the following settings should be added to Hadoop's `core-site.xml` file on each node in the cluster. 
+
+```xml
+<property>
+  <name>fs.s3a.access.key</name>
+  <value>KEY</value>
+</property>
+<property>
+  <name>fs.s3a.secret.key</name>
+  <value>SECRET</value>
+</property>
+<!-- without this setting Accumulo tservers would have problems when trying to open lots of files -->
+<property>
+  <name>fs.s3a.connection.maximum</name>
+  <value>128</value>
+</property>
+```
+
+See [S3A docs](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#S3A)
+for more S3A settings.  To get hadoop command to work with s3 set `export
+HADOOP_OPTIONAL_TOOLS="hadoop-aws"` in `hadoop-env.sh`.
+
+When trying to use Accumulo with Hadoop's AWS jar [HADOOP-16080] was
+encountered.  The following instructions build a relocated hadoop-aws jar as a
+work around.  After building the jar copy it to all nodes in the cluster.
+
+```bash
+mkdir -p /tmp/haws-reloc
+cd /tmp/haws-reloc
+# get the Maven pom file that builds a relocated jar
+wget https://gist.githubusercontent.com/keith-turner/f6dcbd33342732e42695d66509239983/raw/714cb801eb49084e0ceef5c6eb4027334fd51f87/pom.xml
+mvn package -Dhadoop.version=<your hadoop version>
+# the new jar will be in target
+ls target/
+```
+
+## Accumulo setup
+
+For each node in the cluster, modify `accumulo-env.sh` to add S3 jars to the
+classpath.  Your versions may differ depending on your Hadoop version,
+following versions were included with Hadoop 3.2.0.
+
+```bash
+CLASSPATH="${conf}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${HADOOP_HOME}/share/hadoop/client/*"
+CLASSPATH="${CLASSPATH}:/somedir/hadoop-aws-relocated.3.2.0.jar"
+CLASSPATH="${CLASSPATH}:${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar"
+# The following are dependencies needed by by the previous jars and are subject to change
+CLASSPATH="${CLASSPATH}:${HADOOP_HOME}/share/hadoop/common/lib/jaxb-api-2.2.11.jar"
+CLASSPATH="${CLASSPATH}:${HADOOP_HOME}/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar"
+CLASSPATH="${CLASSPATH}:${HADOOP_HOME}/share/hadoop/common/lib/commons-lang3-3.7jar"
+export CLASSPATH
+```
+
+Set the following in `accumulo.properties` and then run `accumulo init`, but don't start Accumulo.
+
+
+```
 
 Review comment:
   That worked out nicely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services