You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mw...@apache.org on 2016/12/13 14:17:42 UTC
accumulo-wikisearch git commit: Updated wikisearch documentation
Repository: accumulo-wikisearch
Updated Branches:
refs/heads/master 9c30660f6 -> 7fdf1bebb
Updated wikisearch documentation
* Made documentation use markdown
* Combined regular and parellel install instructions
* Moved install instructions to INSTALL.md
* Pulled in design/performance documentation from website
Project: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/commit/7fdf1beb
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/tree/7fdf1beb
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/diff/7fdf1beb
Branch: refs/heads/master
Commit: 7fdf1bebb2e2b4ca31d58d2d7fc8de8f157a63f3
Parents: 9c30660
Author: Mike Walch <mw...@apache.org>
Authored: Mon Dec 12 15:26:41 2016 -0500
Committer: Mike Walch <mw...@apache.org>
Committed: Mon Dec 12 15:51:15 2016 -0500
----------------------------------------------------------------------
INSTALL.md | 104 ++++++++++++++++++++++++
README | 66 ---------------
README.md | 221 +++++++++++++++++++++++++++++++++++++++++++++++++++
README.parallel | 65 ---------------
4 files changed, 325 insertions(+), 131 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/INSTALL.md
----------------------------------------------------------------------
diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000..fff2bc0
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,104 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Wikisearch Installation
+
+Instructions for installing and running the Accumulo Wikisearch example.
+
+## Ingest
+
+### Prerequisites
+
+1. Accumulo, Hadoop, and ZooKeeper must be installed and running
+1. Download one or more [wikipedia dump files][dump-files] and put them in an HDFS directory.
+ You will want to grab the files with the link name of pages-articles.xml.bz2. Though not strictly
+ required, the ingest will go more quickly if the files are decompressed:
+
+ $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml
+
+### Instructions
+
+1. Create a `wikipedia.xml` file (or `wikipedia_parallel.xml` if running parallel version) from
+ [wikipedia.xml.example] or [wikipedia_parallel.xml.example] and modify for your Accumulo
+ installation.
+
+ $ cp ingest/conf
+ $ cp wikipedia.xml.example wikipedia.xml
+ $ vim wikipedia.xml
+
+1. Copy `ingest/lib/wikisearch-*.jar` and `ingest/lib/protobuf*.jar` to `$ACCUMULO_HOME/lib/ext`
+1. Run `ingest/bin/ingest.sh` (or `ingest_parallel.sh` if running parallel version) with one
+ argument (the name of the directory in HDFS where the wikipedia XML files reside) and this will
+ kick off a MapReduce job to ingest the data into Accumulo.
+
+## Query
+
+### Prerequisites
+
+1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
+ - NOTE: Ran into a [bug] that did not allow an EJB3.1 war file. The workaround is to separate the RESTEasy servlet
+ from the EJBs by creating an EJB jar and a WAR file.
+
+### Instructions
+
+1. Create a `ejb-jar.xml` from [ejb-jar.xml.example] and modify it to contain the same information
+ that you put into `wikipedia.xml` in the ingest steps above:
+
+ cd query/src/main/resources/META-INF/
+ cp ejb-jar.xml.example ejb-jar.xml
+ vim ejb-jar.xml
+
+1. Re-build the query distribution by running `mvn package assembly:single` in the query module's directory.
+1. Untar the resulting file in the `$JBOSS_HOME/server/default` directory.
+
+ $ cd $JBOSS_HOME/server/default
+ $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz
+
+ This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
+1. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy.
+1. Start JBoss ($JBOSS_HOME/bin/run.sh)
+1. Use the Accumulo shell and give the user permissions for the wikis that you loaded:
+
+ > setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
+
+1. Copy the following jars to the `$ACCUMULO_HOME/lib/ext` directory from the `$JBOSS_HOME/server/default/lib` directory:
+
+ kryo*.jar
+ minlog*.jar
+ commons-jexl*.jar
+
+1. Copy `$JBOSS_HOME/server/default/deploy/wikisearch-query*.jar` to `$ACCUMULO_HOME/lib/ext.`
+
+1. At this point you should be able to open a browser and view the page:
+
+ http://localhost:8080/accumulo-wikisearch/ui/ui.jsp
+
+ You can issue the queries using this user interface or via the following REST urls:
+
+ <host>/accumulo-wikisearch/rest/Query/xml
+ <host>/accumulo-wikisearch/rest/Query/html
+ <host>/accumulo-wikisearch/rest/Query/yaml
+ <host>/accumulo-wikisearch/rest/Query/json.
+
+ There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
+ into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
+ enwiki,frwiki,dewiki, etc. Or you can use all)
+
+[ejb-jar.xml.example]: query/src/main/resources/META-INF/ejb-jar.xml.example
+[dump-files]: http://dumps.wikimedia.org/backup-index.html
+[wikipedia.xml.example]: ingest/conf/wikipedia.xml.example
+[wikipedia_parallel.xml.example]: ingest/conf/wikipedia_parallel.xml.example
+[bug]: https://issues.jboss.org/browse/RESTEASY-531
http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README
----------------------------------------------------------------------
diff --git a/README b/README
deleted file mode 100644
index ad28cdc..0000000
--- a/README
+++ /dev/null
@@ -1,66 +0,0 @@
- Apache Accumulo Wikipedia Search Example
-
- This project contains a sample application for ingesting and querying wikipedia data.
-
-
- Ingest
- ------
-
- Prerequisites
- -------------
- 1. Accumulo, Hadoop, and ZooKeeper must be installed and running
- 2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory.
- You will want to grab the files with the link name of pages-articles.xml.bz2
- 3. Though not strictly required, the ingest will go more quickly if the files are decompressed:
-
- $ bunzip2 < enwiki-*-pages-articles.xml.bz2 | hadoop fs -put - /wikipedia/enwiki-pages-articles.xml
-
-
- INSTRUCTIONS
- ------------
- 1. Copy the ingest/conf/wikipedia.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information.
- 2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext
- 3. Then run ingest/bin/ingest.sh with one argument (the name of the directory in HDFS where the wikipedia XML
- files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
-
- Query
- -----
-
- Prerequisites
- -------------
- 1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
-
- NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The
- workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file.
-
- INSTRUCTIONS
- -------------
- 1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to
- query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same
- information that you put into the wikipedia.xml file from the Ingest step above.
- 2. Re-build the query distribution by running 'mvn package assembly:single' in the query module's directory.
- 3. Untar the resulting file in the $JBOSS_HOME/server/default directory.
-
- $ cd $JBOSS_HOME/server/default
- $ tar -xzf /some/path/to/wikisearch/query/target/wikisearch-query*.tar.gz
-
- This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
- 4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy.
- 5. Start JBoss ($JBOSS_HOME/bin/run.sh)
- 6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example:
- setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
- 7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory:
-
- kryo*.jar
- minlog*.jar
- commons-jexl*.jar
-
- 8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext.
-
-
- 9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp.
- You can issue the queries using this user interface or via the following REST urls: <host>/accumulo-wikisearch/rest/Query/xml,
- <host>/accumulo-wikisearch/rest/Query/html, <host>/accumulo-wikisearch/rest/Query/yaml, or <host>/accumulo-wikisearch/rest/Query/json.
- There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
- into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
- enwiki,frwiki,dewiki, etc. Or you can use all)
http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..42289fe
--- /dev/null
+++ b/README.md
@@ -0,0 +1,221 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Apache Accumulo Wikisearch
+
+Wikisearch is an example Accumulo application that provides a flexible, scalable
+search over Wikipedia articles.
+
+## Installation
+
+Follow the [install instructions][install] to run the example.
+
+## Design
+
+The example uses an indexing technique helpful for doing multiple logical tests
+against content. In this case, we can perform a word search on Wikipedia
+articles. The sample application takes advantage of 3 unique capabilities of
+Accumulo:
+
+1. Extensible iterators that operate within the distributed tablet servers of
+ the key-value store
+1. Custom aggregators which can efficiently condense information during the
+ various life-cycles of the log-structured merge tree
+1. Custom load balancing, which ensures that a table is evenly distributed on
+ all tablet servers
+
+In the example, Accumulo tracks the cardinality of all terms as elements are
+ingested. If the cardinality is small enough, it will track the set of
+documents by term directly. For example:
+
+| Row (word) | Value (count) | Value (document list) |
+|------------|--------------:|:----------------------------|
+| Octopus | 2 | [Document 57, Document 220] |
+| Other | 172,849 | [] |
+| Ostrich | 1 | [Document 901] |
+
+Searches can be optimized to focus on low-cardinality terms. To create these
+counts, the example installs "aggregators" which are used to combine inserted
+values. The ingester just writes simple "(Octopus, 1, Document 57)" tuples.
+The tablet servers then used the installed aggregators to merge the cells as
+the data is re-written, or queried. This reduces the in-memory locking
+required to update high-cardinality terms, and defers aggregation to a later
+time, where it can be done more efficiently.
+
+The example also creates a reverse word index to map each word to the document
+in which it appears. But it does this by choosing an arbitrary partition for
+the document. The article, and the word index for the article are grouped
+together into the same partition. For example:
+
+| Row (partition) | Column Family | Column Qualifier | Value |
+|-----------------|---------------|------------------|-----------------|
+| 1 | D | Document 57 | "smart Octopus" |
+| 1 | Word, Octopus | Document 57 | |
+| 1 | Word, smart | Document 57 | |
+| ... | | | |
+| 2 | D | Document 220 | "big Octopus" |
+| 2 | Word, big | Document 220 | |
+| 2 | Word, Octopus | Document 220 | |
+
+Of course, there would be large numbers of documents in each partition, and the
+elements of those documents would be interlaced according to their sort order.
+
+By dividing the index space into partitions, the multi-word searches can be
+performed in parallel across all the nodes. Also, by grouping the document
+together with its index, a document can be retrieved without a second request
+from the client. The query "octopus" and "big" will be performed on all the
+servers, but only those partitions for which the low-cardinality term "octopus"
+can be found by using the aggregated reverse index information. The query for a
+document is performed by extensions provided in the example. These extensions
+become part of the tablet server's iterator stack. By cloning the underlying
+iterators, the query extensions can seek to specific words within the index,
+and when it finds a matching document, it can then seek to the document
+location and retrieve the contents.
+
+## Performance
+
+The Wikisearch examples was run a on a cluster of 10 servers, each with 12 cores, and 32G
+RAM, 6 500G drives. Accumulo tablet servers were allowed a maximum of 3G of
+working memory, of which 2G was dedicated to caching file data.
+
+Following the instructions in the example, the Wikipedia XML data for articles
+was loaded for English, Spanish and German languages into 10 partitions. The
+data is not partitioned by language: multiple languages were used to get a
+larger set of test data. The data load took around 8 hours, and has not been
+optimized for scale. Once the data was loaded, the content was compacted which
+took about 35 minutes.
+
+The example uses the language-specific tokenizers available from the Apache
+Lucene project for Wikipedia data.
+
+Original files:
+
+| Articles | Compressed size | Filename |
+|----------|-----------------|----------------------------------------|
+| 1.3M | 2.5G | dewiki-20111120-pages-articles.xml.bz2 |
+| 3.8M | 7.9G | enwiki-20111115-pages-articles.xml.bz2 |
+| 0.8M | 1.4G | eswiki-20111112-pages-articles.xml.bz2 |
+
+The resulting tables:
+
+ > du -p wiki.*
+ 47,325,680,634 [wiki]
+ 5,125,169,305 [wikiIndex]
+ 413 [wikiMetadata]
+ 5,521,690,682 [wikiReverseIndex]
+
+Roughly a 6:1 increase in size.
+
+We performed the following queries, and repeated the set 5 times. The query
+language is much more expressive than what is shown below. The actual query
+specified that these words were to be found in the body of the article. Regular
+expressions, searches within titles, negative tests, etc are available.
+
+| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) | Matches | Result Size |
+|-----------------------------------------|------|------|------|------|------|--------|-----------|
+| "old" and "man" and "sea" | 4.07 | 3.79 | 3.65 | 3.85 | 3.67 | 22,956 | 3,830,102 |
+| "paris" and "in" and "the" and "spring" | 3.06 | 3.06 | 2.78 | 3.02 | 2.92 | 10,755 | 1,757,293 |
+| "rubber" and "ducky" and "ernie" | 0.08 | 0.08 | 0.1 | 0.11 | 0.1 | 6 | 808 |
+| "fast" and ( "furious" or "furriest") | 1.34 | 1.33 | 1.3 | 1.31 | 1.31 | 2,973 | 493,800 |
+| "slashdot" and "grok" | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 14 | 2,371 |
+| "three" and "little" and "pigs" | 0.92 | 0.91 | 0.9 | 1.08 | 0.88 | 2,742 | 481,531 |
+
+Because the terms are tested together within the tablet server, even fairly
+high-cardinality terms such as "old," "man," and "sea" can be tested
+efficiently, without needing to return to the client, or make distributed calls
+between servers to perform the intersection between terms.
+
+For reference, here are the cardinalities for all the terms in the query
+(remember, this is across all languages loaded):
+
+| Term | Cardinality |
+|----------|-------------|
+| ducky | 795 |
+| ernie | 13,433 |
+| fast | 166,813 |
+| furious | 10,535 |
+| furriest | 45 |
+| grok | 1,168 |
+| in | 1,884,638 |
+| little | 320,748 |
+| man | 548,238 |
+| old | 720,795 |
+| paris | 232,464 |
+| pigs | 8,356 |
+| rubber | 17,235 |
+| sea | 247,231 |
+| slashdot | 2,343 |
+| spring | 125,605 |
+| the | 3,509,498 |
+| three | 718,810 |
+
+Accumulo supports caching index information, which is turned on by default, and
+for the non-index blocks of a file, which is not. After turning on data block
+ caching for the wiki table:
+
+| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) |
+|-----------------------------------------|------|------|------|------|------|
+| "old" and "man" and "sea" | 2.47 | 2.48 | 2.51 | 2.48 | 2.49 |
+| "paris" and "in" and "the" and "spring" | 1.33 | 1.42 | 1.6 | 1.61 | 1.47 |
+| "rubber" and "ducky" and "ernie" | 0.07 | 0.08 | 0.07 | 0.07 | 0.07 |
+| "fast" and ( "furious" or "furriest") | 1.28 | 0.78 | 0.77 | 0.79 | 0.78 |
+| "slashdot" and "grok" | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 |
+| "three" and "little" and "pigs" | 0.55 | 0.32 | 0.32 | 0.31 | 0.27 |
+
+For comparison, these are the cold start lookup times (restart Accumulo, and
+drop the operating system disk cache):
+
+| Query | Sample |
+|-----------------------------------------|--------|
+| "old" and "man" and "sea" | 13.92 |
+| "paris" and "in" and "the" and "spring" | 8.46 |
+| "rubber" and "ducky" and "ernie" | 2.96 |
+| "fast" and ( "furious" or "furriest") | 6.77 |
+| "slashdot" and "grok" | 4.06 |
+| "three" and "little" and "pigs" | 8.13 |
+
+### Random Query Load
+
+Random queries were generated using common english words. A uniform random
+sample of 3 to 5 words taken from the 10000 most common words in the Project
+Gutenberg's online text collection were joined with "and". Words containing
+anything other than letters (such as contractions) were not used. A client was
+started simultaneously on each of the 10 servers and each ran 100 random
+queries (1000 queries total).
+
+| Time | Count |
+|-------|---------|
+| 41.97 | 440,743 |
+| 41.61 | 320,522 |
+| 42.11 | 347,969 |
+| 38.32 | 275,655 |
+
+### Query Load During Ingest
+
+The English wikipedia data was re-ingested on top of the existing, compacted
+data. The following query samples were taken in 5 minute intervals while
+ingesting 132 articles/second:
+
+| Query | Sample 1 (seconds) | Sample 2 (seconds) | Sample 3 (seconds) | Sample 4 (seconds) | Sample 5 (seconds) |
+|-----------------------------------------|------|------|-------|------|-------|
+| "old" and "man" and "sea" | 4.91 | 3.92 | 11.58 | 9.86 | 10.21 |
+| "paris" and "in" and "the" and "spring" | 5.03 | 3.37 | 12.22 | 3.29 | 9.46 |
+| "rubber" and "ducky" and "ernie" | 4.21 | 2.04 | 8.57 | 1.54 | 1.68 |
+| "fast" and ( "furious" or "furriest") | 5.84 | 2.83 | 2.56 | 3.12 | 3.09 |
+| "slashdot" and "grok" | 5.68 | 2.62 | 2.2 | 2.78 | 2.8 |
+| "three" and "little" and "pigs" | 7.82 | 3.42 | 2.79 | 3.29 | 3.3 |
+
+[install]: INSTALL.md
http://git-wip-us.apache.org/repos/asf/accumulo-wikisearch/blob/7fdf1beb/README.parallel
----------------------------------------------------------------------
diff --git a/README.parallel b/README.parallel
deleted file mode 100644
index 399f0f3..0000000
--- a/README.parallel
+++ /dev/null
@@ -1,65 +0,0 @@
- Apache Accumulo Wikipedia Search Example (parallel version)
-
- This project contains a sample application for ingesting and querying wikipedia data.
-
-
- Ingest
- ------
-
- Prerequisites
- -------------
- 1. Accumulo, Hadoop, and ZooKeeper must be installed and running
- 2. One or more wikipedia dump files (http://dumps.wikimedia.org/backup-index.html) placed in an HDFS directory.
- You will want to grab the files with the link name of pages-articles.xml.bz2
-
-
- INSTRUCTIONS
- ------------
- 1. Copy the ingest/conf/wikipedia_parallel.xml.example to ingest/conf/wikipedia.xml and change it to specify Accumulo information.
- 2. Copy the ingest/lib/wikisearch-*.jar and ingest/lib/protobuf*.jar to $ACCUMULO_HOME/lib/ext
- 3. Then run ingest/bin/ingest_parallel.sh with one argument (the name of the directory in HDFS where the wikipedia XML
- files reside) and this will kick off a MapReduce job to ingest the data into Accumulo.
-
- Query
- -----
-
- Prerequisites
- -------------
- 1. The query software was tested using JBoss AS 6. Install this unless you feel like messing with the installation.
-
- NOTE: Ran into a bug (https://issues.jboss.org/browse/RESTEASY-531) that did not allow an EJB3.1 war file. The
- workaround is to separate the RESTEasy servlet from the EJBs by creating an EJB jar and a WAR file.
-
- INSTRUCTIONS
- -------------
- 1. Copy the query/src/main/resources/META-INF/ejb-jar.xml.example file to
- query/src/main/resources/META-INF/ejb-jar.xml. Modify to the file to contain the same
- information that you put into the wikipedia.xml file from the Ingest step above.
- 2. Re-build the query distribution by running 'mvn package assembly:single' in the top-level directory.
- 3. Untar the resulting file in the $JBOSS_HOME/server/default directory.
-
- $ cd $JBOSS_HOME/server/default
- $ tar -xzf $ACCUMULO_HOME/src/examples/wikisearch/query/target/wikisearch-query*.tar.gz
-
- This will place the dependent jars in the lib directory and the EJB jar into the deploy directory.
- 4. Next, copy the wikisearch*.war file in the query-war/target directory to $JBOSS_HOME/server/default/deploy.
- 5. Start JBoss ($JBOSS_HOME/bin/run.sh)
- 6. Use the Accumulo shell and give the user permissions for the wikis that you loaded, for example:
- setauths -u <user> -s all,enwiki,eswiki,frwiki,fawiki
- 7. Copy the following jars to the $ACCUMULO_HOME/lib/ext directory from the $JBOSS_HOME/server/default/lib directory:
-
- commons-lang*.jar
- kryo*.jar
- minlog*.jar
- commons-jexl*.jar
- guava*.jar
-
- 8. Copy the $JBOSS_HOME/server/default/deploy/wikisearch-query*.jar to $ACCUMULO_HOME/lib/ext.
-
-
- 9. At this point you should be able to open a browser and view the page: http://localhost:8080/accumulo-wikisearch/ui/ui.jsp.
- You can issue the queries using this user interface or via the following REST urls: <host>/accumulo-wikisearch/rest/Query/xml,
- <host>/accumulo-wikisearch/rest/Query/html, <host>/accumulo-wikisearch/rest/Query/yaml, or <host>/accumulo-wikisearch/rest/Query/json.
- There are two parameters to the REST service, query and auths. The query parameter is the same string that you would type
- into the search box at ui.jsp, and the auths parameter is a comma-separated list of wikis that you want to search (i.e.
- enwiki,frwiki,dewiki, etc. Or you can use all)