You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mw...@apache.org on 2016/12/13 14:17:42 UTC
accumulo-wikisearch git commit: Updated wikisearch documentation

Repository: accumulo-wikisearch
Updated Branches:
  refs/heads/master 9c30660f6 -> 7fdf1bebb


Updated wikisearch documentation

* Made documentation use markdown
* Combined regular and parellel install instructions
* Moved install instructions to INSTALL.md
* Pulled in design/performance documentation from website


Project: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/commit/7fdf1beb
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/tree/7fdf1beb
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/diff/7fdf1beb

Branch: refs/heads/master
Commit: 7fdf1bebb2e2b4ca31d58d2d7fc8de8f157a63f3
Parents: 9c30660
Author: Mike Walch <mw...@apache.org>
Authored: Mon Dec 12 15:26:41 2016 -0500
Committer: Mike Walch <mw...@apache.org>
Committed: Mon Dec 12 15:51:15 2016 -0500

----------------------------------------------------------------------
 INSTALL.md      | 104 ++++++++++++++++++++++++
 README          |  66 ---------------
 README.md       | 221 +++++++++++++++++++++++++++++++++++++++++++++++++++
 README.parallel |  65 ---------------
 4 files changed, 325 insertions(+), 131 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/INSTALL.md
----------------------------------------------------------------------
diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000..fff2bc0
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,104 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Wikisearch Installation
+
+Instructions for installing and running the Accumulo Wikisearch example.
+
+## Ingest
+ 
+### Prerequisites
+
+1. Accumulo, Hadoop, and ZooKeeper must be installed and running
+1. Download one or more [wikipedia dump files][dump-files] and put them in an HDFS directory.
+	 You will want to grab the files with the link name of pages-articles.xml.bz2. Though not strictly
+	 required, the ingest will go more quickly if the files are decompressed:
+
+        $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml
+
+### Instructions
+	
+1. Create a `wikipedia.xml` file (or `wikipedia_parallel.xml` if running parallel version) from
+   [wikipedia.xml.example] or [wikipedia_parallel.xml.example] and modify for your Accumulo
+   installation.
+   
+        $ cp ingest/conf
+        $ cp wikipedia.xml.example wikipedia.xml
+        $ vim wikipedia.xml
+ 
+1. Copy `ingest/lib/wikisearch-*.jar` and `ingest/lib/protobuf*.jar` to `$ACCUMULO_HOME/lib/ext`
+1. Run `ingest/bin/ingest.sh` (or `ingest_parallel.sh` if running parallel version) with one
+   argument (the name of the directory in HDFS where the wikipedia XML files reside) and this will
+   kick off a MapReduce job to ingest the data into Accumulo.
+
+## Query
+ 
+### Prerequisites
+
+1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
+  - NOTE: Ran into a [bug] that did not allow an EJB3.1 war file. The workaround is to separate the RESTEasy servlet
+    from the EJBs by creating an EJB jar and a WAR file.
+	
+### Instructions
+
+1. Create a `ejb-jar.xml` from [ejb-jar.xml.example] and modify it to contain the same information
+   that you put into `wikipedia.xml` in the ingest steps above:
+
+        cd query/src/main/resources/META-INF/
+        cp ejb-jar.xml.example ejb-jar.xml
+        vim ejb-jar.xml
+
+1. Re-build the query distribution by running `mvn package assembly:single` in the query module's directory.
+1. Untar the resulting file in the `$JBOSS_HOME/server/default` directory.
+
+        $ cd $JBOSS_HOME/server/default
+        $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz
+ 
+   This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
+1. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. 
+1. Start JBoss ($JBOSS_HOME/bin/run.sh)
+1. Use the Accumulo shell and give the user permissions for the wikis that you loaded:
+			
+        > setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
+			  
+1. Copy the following jars to the `$ACCUMULO_HOME/lib/ext` directory from the `$JBOSS_HOME/server/default/lib` directory:
+	
+        kryo*.jar
+        minlog*.jar
+        commons-jexl*.jar
+		
+1. Copy `$JBOSS_HOME/server/default/deploy/wikisearch-query*.jar` to `$ACCUMULO_HOME/lib/ext.`
+
+1. At this point you should be able to open a browser and view the page:
+
+        http://localhost:8080/accumulo-wikisearch/ui/ui.jsp
+
+  You can issue the queries using this user interface or via the following REST urls:
+
+        <host>/accumulo-wikisearch/rest/Query/xml
+        <host>/accumulo-wikisearch/rest/Query/html
+        <host>/accumulo-wikisearch/rest/Query/yaml
+        <host>/accumulo-wikisearch/rest/Query/json.
+
+  There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
+	into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
+	enwiki,frwiki,dewiki, etc. Or you can use all) 
+	
+[ejb-jar.xml.example]: query/src/main/resources/META-INF/ejb-jar.xml.example
+[dump-files]: http://dumps.wikimedia.org/backup-index.html
+[wikipedia.xml.example]: ingest/conf/wikipedia.xml.example
+[wikipedia_parallel.xml.example]: ingest/conf/wikipedia_parallel.xml.example
+[bug]: https://issues.jboss.org/browse/RESTEASY-531

http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README
----------------------------------------------------------------------
diff --git a/README b/README
deleted file mode 100644
index ad28cdc..0000000
--- a/README
+++ /dev/null
@@ -1,66 +0,0 @@
- Apache Accumulo Wikipedia Search Example
-
- This project contains a sample application for ingesting and querying wikipedia data.
- 
-  
- Ingest
- ------
- 
- 	Prerequisites
- 	-------------
- 	1. Accumulo, Hadoop, and ZooKeeper must be installed and running
- 	2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory.
-	   You will want to grab the files with the link name of pages-articles.xml.bz2
-        3. Though not strictly required, the ingest will go more quickly if the files are decompressed:
-
-            $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml
-
- 
- 	INSTRUCTIONS
- 	------------
-	1. Copy the ingest/conf/wikipedia.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information. 
-	2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext
-	3. Then run ingest/bin/ingest.sh with one argument (the name of the directory in HDFS where the wikipedia XML 
-           files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
-   
- Query
- -----
- 
- 	Prerequisites
- 	-------------
-	1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
- 	
-	NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The
-	workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file.
-	
-	INSTRUCTIONS
-	-------------
-	1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to 
-	   query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same 
-	   information that you put into the wikipedia.xml file from the Ingest step above. 
-	2. Re-build the query distribution by running 'mvn package assembly:single' in the query module's directory.
-        3. Untar the resulting file in the $JBOSS_HOME/server/default directory.
-
-              $ cd $JBOSS_HOME/server/default
-              $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz
- 
-           This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
-	4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. 
-	5. Start JBoss ($JBOSS_HOME/bin/run.sh)
-	6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example: 
-			setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
-	7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory:
-	
-		kryo*.jar
-		minlog*.jar
-		commons-jexl*.jar
-		
-	8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext.
-
-
-	9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp.
-	You can issue the queries using this user interface or via the following REST urls: <host>/accumulo-wikisearch/rest/Query/xml,
-	<host>/accumulo-wikisearch/rest/Query/html, <host>/accumulo-wikisearch/rest/Query/yaml, or <host>/accumulo-wikisearch/rest/Query/json.
-	There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
-	into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
-	enwiki,frwiki,dewiki, etc. Or you can use all) 

http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..42289fe
--- /dev/null
+++ b/README.md
@@ -0,0 +1,221 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Apache Accumulo Wikisearch
+
+Wikisearch is an example Accumulo application that provides a flexible, scalable
+search over Wikipedia articles.
+
+## Installation
+
+Follow the [install instructions][install] to run the example.
+
+## Design
+
+The example uses an indexing technique helpful for doing multiple logical tests
+against content. In this case, we can perform a word search on Wikipedia
+articles. The sample application takes advantage of 3 unique capabilities of
+Accumulo:
+
+1. Extensible iterators that operate within the distributed tablet servers of
+   the key-value store
+1. Custom aggregators which can efficiently condense information during the
+   various life-cycles of the log-structured merge tree 
+1. Custom load balancing, which ensures that a table is evenly distributed on
+   all tablet servers
+
+In the example, Accumulo tracks the cardinality of all terms as elements are
+ingested. If the cardinality is small enough, it will track the set of
+documents by term directly. For example:
+
+| Row (word) | Value (count) | Value (document list)       |
+|------------|--------------:|:----------------------------|
+| Octopus    | 2             | [Document 57, Document 220] |
+| Other      | 172,849       | []                          |
+| Ostrich    | 1             | [Document 901]              |
+
+Searches can be optimized to focus on low-cardinality terms. To create these
+counts, the example installs "aggregators" which are used to combine inserted
+values. The ingester just writes simple "(Octopus, 1, Document 57)" tuples.
+The tablet servers then used the installed aggregators to merge the cells as
+the data is re-written, or queried. This reduces the in-memory locking
+required to update high-cardinality terms, and defers aggregation to a later
+time, where it can be done more efficiently.
+
+The example also creates a reverse word index to map each word to the document
+in which it appears. But it does this by choosing an arbitrary partition for
+the document. The article, and the word index for the article are grouped
+together into the same partition. For example:
+
+| Row (partition) | Column Family | Column Qualifier | Value           |
+|-----------------|---------------|------------------|-----------------|
+| 1               | D             | Document 57      | "smart Octopus" |
+| 1               | Word, Octopus | Document 57      |                 |
+| 1               | Word, smart   | Document 57      |                 |
+| ...             |               |                  |                 |
+| 2               | D             | Document 220     | "big Octopus"   |
+| 2               | Word, big     | Document 220     |                 |
+| 2               | Word, Octopus | Document 220     |                 |
+
+Of course, there would be large numbers of documents in each partition, and the
+elements of those documents would be interlaced according to their sort order.
+
+By dividing the index space into partitions, the multi-word searches can be
+performed in parallel across all the nodes. Also, by grouping the document
+together with its index, a document can be retrieved without a second request
+from the client. The query "octopus" and "big" will be performed on all the
+servers, but only those partitions for which the low-cardinality term "octopus"
+can be found by using the aggregated reverse index information. The query for a
+document is performed by extensions provided in the example. These extensions
+become part of the tablet server's iterator stack. By cloning the underlying
+iterators, the query extensions can seek to specific words within the index,
+and when it finds a matching document, it can then seek to the document
+location and retrieve the contents.
+
+## Performance
+
+The Wikisearch examples was run a on a cluster of 10 servers, each with 12 cores, and 32G
+RAM, 6 500G drives. Accumulo tablet servers were allowed a maximum of 3G of
+working memory, of which 2G was dedicated to caching file data.
+
+Following the instructions in the example, the Wikipedia XML data for articles
+was loaded for English, Spanish and German languages into 10 partitions. The
+data is not partitioned by language: multiple languages were used to get a
+larger set of test data. The data load took around 8 hours, and has not been
+optimized for scale. Once the data was loaded, the content was compacted which
+took about 35 minutes.
+
+The example uses the language-specific tokenizers available from the Apache
+Lucene project for Wikipedia data.
+
+Original files:
+
+| Articles | Compressed size | Filename                               |
+|----------|-----------------|----------------------------------------|
+| 1.3M     | 2.5G            | dewiki-20111120-pages-articles.xml.bz2 |
+| 3.8M     | 7.9G            | enwiki-20111115-pages-articles.xml.bz2 |
+| 0.8M     | 1.4G            | eswiki-20111112-pages-articles.xml.bz2 |
+
+The resulting tables:
+
+    > du -p wiki.*
+          47,325,680,634 [wiki]
+           5,125,169,305 [wikiIndex]
+                     413 [wikiMetadata]
+           5,521,690,682 [wikiReverseIndex]
+
+Roughly a 6:1 increase in size.
+
+We performed the following queries, and repeated the set 5 times. The query
+language is much more expressive than what is shown below. The actual query
+specified that these words were to be found in the body of the article. Regular
+expressions, searches within titles, negative tests, etc are available.
+
+| Query                                   | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) | Matches | Result Size |
+|-----------------------------------------|------|------|------|------|------|--------|-----------|
+| "old" and "man" and "sea"               | 4.07 | 3.79 | 3.65 | 3.85 | 3.67 | 22,956 | 3,830,102 |
+| "paris" and "in" and "the" and "spring" | 3.06 | 3.06 | 2.78 | 3.02 | 2.92 | 10,755 | 1,757,293 |
+| "rubber" and "ducky" and "ernie"        | 0.08 | 0.08 | 0.1  | 0.11 | 0.1  | 6      | 808       |
+| "fast" and ( "furious" or "furriest")   | 1.34 | 1.33 | 1.3  | 1.31 | 1.31 | 2,973  | 493,800   |
+| "slashdot" and "grok"                   | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 14     | 2,371     |
+| "three" and "little" and "pigs"         | 0.92 | 0.91 | 0.9  | 1.08 | 0.88 | 2,742  | 481,531   |
+
+Because the terms are tested together within the tablet server, even fairly
+high-cardinality terms such as "old," "man," and "sea" can be tested
+efficiently, without needing to return to the client, or make distributed calls
+between servers to perform the intersection between terms.
+
+For reference, here are the cardinalities for all the terms in the query
+(remember, this is across all languages loaded):
+
+| Term     | Cardinality |
+|----------|-------------|
+| ducky    | 795         |
+| ernie    | 13,433      |
+| fast     | 166,813     |
+| furious  | 10,535      |
+| furriest | 45          |
+| grok     | 1,168       |
+| in       | 1,884,638   |
+| little   | 320,748     |
+| man      | 548,238     |
+| old      | 720,795     |
+| paris    | 232,464     |
+| pigs     | 8,356       |
+| rubber   | 17,235      |
+| sea      | 247,231     |
+| slashdot | 2,343       |
+| spring   | 125,605     |
+| the      | 3,509,498   |
+| three    | 718,810     |
+
+Accumulo supports caching index information, which is turned on by default, and
+for the non-index blocks of a file, which is not. After turning on data block
+  caching for the wiki table:
+
+| Query                                   | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) |
+|-----------------------------------------|------|------|------|------|------|
+| "old" and "man" and "sea"               | 2.47 | 2.48 | 2.51 | 2.48 | 2.49 |
+| "paris" and "in" and "the" and "spring" | 1.33 | 1.42 | 1.6  | 1.61 | 1.47 |
+| "rubber" and "ducky" and "ernie"        | 0.07 | 0.08 | 0.07 | 0.07 | 0.07 |
+| "fast" and ( "furious" or "furriest")   | 1.28 | 0.78 | 0.77 | 0.79 | 0.78 |
+| "slashdot" and "grok"                   | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 |
+| "three" and "little" and "pigs"         | 0.55 | 0.32 | 0.32 | 0.31 | 0.27 |
+
+For comparison, these are the cold start lookup times (restart Accumulo, and
+drop the operating system disk cache):
+
+| Query                                   | Sample |
+|-----------------------------------------|--------|
+| "old" and "man" and "sea"               | 13.92  |
+| "paris" and "in" and "the" and "spring" | 8.46   |
+| "rubber" and "ducky" and "ernie"        | 2.96   |
+| "fast" and ( "furious" or "furriest")   | 6.77   |
+| "slashdot" and "grok"                   | 4.06   |
+| "three" and "little" and "pigs"         | 8.13   |
+
+### Random Query Load
+
+Random queries were generated using common english words. A uniform random
+sample of 3 to 5 words taken from the 10000 most common words in the Project
+Gutenberg's online text collection were joined with "and". Words containing
+anything other than letters (such as contractions) were not used. A client was
+started simultaneously on each of the 10 servers and each ran 100 random
+queries (1000 queries total).
+
+| Time  | Count   |
+|-------|---------|
+| 41.97 | 440,743 |
+| 41.61 | 320,522 |
+| 42.11 | 347,969 |
+| 38.32 | 275,655 |
+
+### Query Load During Ingest
+
+The English wikipedia data was re-ingested on top of the existing, compacted
+data. The following query samples were taken in 5 minute intervals while
+ingesting 132 articles/second:
+
+| Query                                   | Sample 1 (seconds)  | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) |
+|-----------------------------------------|------|------|-------|------|-------|
+| "old" and "man" and "sea"               | 4.91 | 3.92 | 11.58 | 9.86 | 10.21 |
+| "paris" and "in" and "the" and "spring" | 5.03 | 3.37 | 12.22 | 3.29 | 9.46  |
+| "rubber" and "ducky" and "ernie"        | 4.21 | 2.04 | 8.57  | 1.54 | 1.68  |
+| "fast" and ( "furious" or "furriest")   | 5.84 | 2.83 | 2.56  | 3.12 | 3.09  |
+| "slashdot" and "grok"                   | 5.68 | 2.62 | 2.2   | 2.78 | 2.8   |
+| "three" and "little" and "pigs"         | 7.82 | 3.42 | 2.79  | 3.29 | 3.3   |
+
+[install]: INSTALL.md

http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.parallel
----------------------------------------------------------------------
diff --git a/README.parallel b/README.parallel
deleted file mode 100644
index 399f0f3..0000000
--- a/README.parallel
+++ /dev/null
@@ -1,65 +0,0 @@
- Apache Accumulo Wikipedia Search Example (parallel version)
-
- This project contains a sample application for ingesting and querying wikipedia data.
- 
-  
- Ingest
- ------
- 
- 	Prerequisites
- 	-------------
- 	1. Accumulo, Hadoop, and ZooKeeper must be installed and running
- 	2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory.
-	     You will want to grab the files with the link name of pages-articles.xml.bz2
- 
- 
- 	INSTRUCTIONS
- 	------------
-	1. Copy the ingest/conf/wikipedia_parallel.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information. 
-	2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext
-	3. Then run ingest/bin/ingest_parallel.sh with one argument (the name of the directory in HDFS where the wikipedia XML 
-             files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
-   
- Query
- -----
- 
- 	Prerequisites
- 	-------------
-	1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
- 	
-	NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The
-	workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file.
-	
-	INSTRUCTIONS
-	-------------
-	1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to 
-	   query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same 
-	   information that you put into the wikipedia.xml file from the Ingest step above. 
-	2. Re-build the query distribution by running 'mvn package assembly:single' in the top-level directory. 
-        3. Untar the resulting file in the $JBOSS_HOME/server/default directory.
-
-              $ cd $JBOSS_HOME/server/default
-              $ tar -xzf $ACCUMULO_HOME/src/examples/wikisearch/query/target/wikisearch-query*.tar.gz
- 
-           This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
-	4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy. 
-	5. Start JBoss ($JBOSS_HOME/bin/run.sh)
-	6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example: 
-			setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
-	7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory:
-	
-		commons-lang*.jar
-		kryo*.jar
-		minlog*.jar
-		commons-jexl*.jar
-		guava*.jar
-		
-	8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext.
-
-
-	9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp.
-	You can issue the queries using this user interface or via the following REST urls: <host>/accumulo-wikisearch/rest/Query/xml,
-	<host>/accumulo-wikisearch/rest/Query/html, <host>/accumulo-wikisearch/rest/Query/yaml, or <host>/accumulo-wikisearch/rest/Query/json.
-	There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
-	into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
-	enwiki,frwiki,dewiki, etc. Or you can use all)