You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@opennlp.apache.org by co...@apache.org on 2017/05/20 10:49:19 UTC

[1/3] opennlp-site git commit: OPENNLP-1069: Add missing docs and automate the inclusion process

Repository: opennlp-site
Updated Branches:
  refs/heads/master d74013d1e -> 08c3208cd


http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/src/main/jbake/content/docs/index.ad
----------------------------------------------------------------------
diff --git a/src/main/jbake/content/docs/index.ad b/src/main/jbake/content/docs/index.ad
index 358e7f6..4249d10 100755
--- a/src/main/jbake/content/docs/index.ad
+++ b/src/main/jbake/content/docs/index.ad
@@ -34,3 +34,5 @@ explains how the various OpenNLP components can be used and trained.
 * link:/docs/1.8.0/apidocs/opennlp-morfologik-addon/index.html[Apache OpenNLP Morfologik Addon Javadoc]
 
 Note: All the documentation is also included in the binary distribution.
+
+Documentation for archieved releases can be found link:/docs/legacy.html[here].
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/src/main/jbake/content/docs/legacy.ad
----------------------------------------------------------------------
diff --git a/src/main/jbake/content/docs/legacy.ad b/src/main/jbake/content/docs/legacy.ad
new file mode 100755
index 0000000..a3aa981
--- /dev/null
+++ b/src/main/jbake/content/docs/legacy.ad
@@ -0,0 +1,64 @@
+////
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.   
+////
+= Documentation
+:jbake-type: page
+:jbake-tags: documentation
+:jbake-status: published
+:idprefix:
+
+WARNING: This page contains the archieved documentation. Please refer to link:/docs/index.html[Apache OpenNLP Manual] for the current documentation.
+
+There exists a manual and Javadoc API documentation for Apache OpenNLP. The manual
+explains how the various OpenNLP components can be used and trained.
+
+### Apache OpenNLP 1.7.2 documentation
+
+* link:/docs/1.7.2/manual/opennlp.html[Apache OpenNLP Manual]
+* link:/docs/1.7.2/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools Javadoc]
+* link:/docs/1.7.2/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA Javadoc]
+* link:/docs/1.7.2/apidocs/opennlp-brat-annotator/index.html[Apache OpenNLP BRAT Annotator Javadoc]
+* link:/docs/1.7.2/apidocs/opennlp-morfologik-addon/index.html[Apache OpenNLP Morfologik Addon Javadoc]
+
+### Apache OpenNLP 1.7.1 documentation
+
+* link:/docs/1.7.1/manual/opennlp.html[Apache OpenNLP Manual]
+* link:/docs/1.7.1/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools Javadoc]
+* link:/docs/1.7.1/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA Javadoc]
+* link:/docs/1.7.1/apidocs/opennlp-brat-annotator/index.html[Apache OpenNLP BRAT Annotator Javadoc]
+* link:/docs/1.7.1/apidocs/opennlp-morfologik-addon/index.html[Apache OpenNLP Morfologik Addon Javadoc]
+
+### Apache OpenNLP 1.7.0 documentation
+
+* link:/docs/1.7.0/manual/opennlp.html[Apache OpenNLP Manual]
+* link:/docs/1.7.0/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools Javadoc]
+* link:/docs/1.7.0/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA Javadoc]
+* link:/docs/1.7.0/apidocs/opennlp-brat-annotator/index.html[Apache OpenNLP BRAT Annotator Javadoc]
+* link:/docs/1.7.0/apidocs/opennlp-morfologik-addon/index.html[Apache OpenNLP Morfologik Addon Javadoc]
+
+### Apache OpenNLP 1.6.0 documentation
+
+* link:/docs/1.6.0/manual/opennlp.html[Apache OpenNLP Manual]
+* link:/docs/1.6.0/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools Javadoc]
+* link:/docs/1.6.0/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA Javadoc]
+
+### Apache OpenNLP 1.5.3 documentation
+
+* link:/docs/1.5.3/manual/opennlp.html[Apache OpenNLP Manual]
+* link:/docs/1.5.3/apidocs/opennlp-tools/index.html[Apache OpenNLP Tools Javadoc]
+* link:/docs/1.5.3/apidocs/opennlp-uima/index.html[Apache OpenNLP UIMA Javadoc]

[2/3] opennlp-site git commit: OPENNLP-1069: Add missing docs and automate the inclusion process

Posted by co...@apache.org.

http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/src/main/docs/1.7.2/manual/opennlp.html
----------------------------------------------------------------------
diff --git a/src/main/docs/1.7.2/manual/opennlp.html b/src/main/docs/1.7.2/manual/opennlp.html
deleted file mode 100644
index 84dc967..0000000
--- a/src/main/docs/1.7.2/manual/opennlp.html
+++ /dev/null
@@ -1,5388 +0,0 @@
-<html><head>
-      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-   <title>Apache OpenNLP Developer Documentation</title><link rel="stylesheet" href="css/opennlp-docs.css" type="text/css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.75.2"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="Apache OpenNLP Developer Documentation"><div class="titlepage"><div><div><h1 class="title"><a name="d4e1"></a>Apache OpenNLP Developer Documentation</h1></div><div><div class="authorgroup">
-			<h3 class="corpauthor">Written and maintained by the Apache OpenNLP Development
-				Community</h3>
-		</div></div><div><p class="releaseinfo">
-			Version 1.7.2
-		</p></div><div><p class="copyright">Copyright &copy; 2011, 2017 The Apache Software Foundation</p></div><div><div class="legalnotice" title="Legal Notice"><a name="d4e7"></a>
-			<p title="License and Disclaimer">
-				<b>License and Disclaimer.&nbsp;</b>
-				
-					The ASF licenses this documentation
-					to you under the Apache License,
-					Version 2.0 (the
-					"License"); you may not use this documentation
-					except in compliance
-					with the License. You may obtain a copy of the
-					License at
-
-					</p><div class="blockquote"><blockquote class="blockquote">
-						<p>
-							<a class="ulink" href="http://www.apache.org/licenses/LICENSE-2.0" target="_top">http://www.apache.org/licenses/LICENSE-2.0</a>
-						</p>
-					</blockquote></div><p title="License and Disclaimer">
-
-					Unless required by applicable law or agreed to in writing,
-					this documentation and its contents are distributed under the License
-					on an
-					"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-					KIND, either express or implied. See the License for the
-					specific language governing permissions and limitations
-					under the License.
-				
-			</p>
-		</div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="chapter"><a href="#opennlp">1. Introduction</a></span></dt><dd><dl><dt><span class="section"><a href="#intro.description">Description</a></span></dt><dt><span class="section"><a href="#intro.general.library.structure">General Library Structure</a></span></dt><dt><span class="section"><a href="#intro.api">Application Program Interface (API). Generic Example</a></span></dt><dt><span class="section"><a href="#intro.cli">Command line interface (CLI)</a></span></dt><dd><dl><dt><span class="section"><a href="#intro.cli.description">Description</a></span></dt><dt><span class="section"><a href="#intro.cli.toolslist">List of tools</a></span></dt><dt><span class="section"><a href="#intro.cli.setup">Setting up</a></span></dt><dt><span class="section"><a href="#intro.cli.generic">Generic Example</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.sentdetect">2. Sentence De
 tector</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.detection">Sentence Detection</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.detection.cmdline">Sentence Detection Tool</a></span></dt><dt><span class="section"><a href="#tools.sentdetect.detection.api">Sentence Detection API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.sentdetect.training">Sentence Detector Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.sentdetect.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.sentdetect.eval">Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.eval.tool">Evaluation Tool</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.tokenizer">3. Tokenizer</a></span></dt><dd><dl><dt><span class="secti
 on"><a href="#tools.tokenizer.introduction">Tokenization</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.cmdline">Tokenizer Tools</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.api">Tokenizer API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.tokenizer.training">Tokenizer Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.tokenizer.detokenizing">Detokenizing</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.detokenizing.api">Detokenizing API</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.detokenizing.dict">Detokenizer Dictionary</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.namefind">4. Name Finder</a></span></dt><dd><dl><d
 t><span class="section"><a href="#tools.namefind.recognition">Named Entity Recognition</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.recognition.cmdline">Name Finder Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.recognition.api">Name Finder API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.namefind.training">Name Finder Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.training.api">Training API</a></span></dt><dt><span class="section"><a href="#tools.namefind.training.featuregen">Custom Feature Generation</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.namefind.eval">Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.eval.tool">Evaluation Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.eval.api">Evaluation AP
 I</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.namefind.annotation_guides">Named Entity Annotation Guidelines</a></span></dt></dl></dd><dt><span class="chapter"><a href="#tools.doccat">5. Document Categorizer</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.doccat.classifying">Classifying</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.doccat.classifying.cmdline">Document Categorizer Tool</a></span></dt><dt><span class="section"><a href="#tools.doccat.classifying.api">Document Categorizer API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.doccat.training">Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.doccat.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.doccat.training.api">Training API</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.postagger">6. Part-of-Speech Tagger</a></span></dt><dd><dl><dt><span class="sectio
 n"><a href="#tools.postagger.tagging">Tagging</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.postagger.tagging.cmdline">POS Tagger Tool</a></span></dt><dt><span class="section"><a href="#tools.postagger.tagging.api">POS Tagger API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.postagger.training">Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.postagger.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.postagger.training.api">Training API</a></span></dt><dt><span class="section"><a href="#tools.postagger.training.tagdict">Tag Dictionary</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.postagger.eval">Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.postagger.eval.tool">Evaluation Tool</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.lemmatizer">7. Lemmatizer</a></span></dt><dd><dl><dt><span class="section"><a hre
 f="#tools.lemmatizer.tagging.cmdline">Lemmatizer Tool</a></span></dt><dt><span class="section"><a href="#tools.lemmatizer.tagging.api">Lemmatizer API</a></span></dt><dt><span class="section"><a href="#tools.lemmatizer.training">Lemmatizer Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.lemmatizer.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.lemmatizer.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.lemmatizer.evaluation">Lemmatizer Evaluation</a></span></dt></dl></dd><dt><span class="chapter"><a href="#tools.chunker">8. Chunker</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.chunking">Chunking</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.chunking.cmdline">Chunker Tool</a></span></dt><dt><span class="section"><a href="#tools.parser.chunking.api">Chunking API</a></span></dt></dl></dd><dt><span class="section"><a href="#tool
 s.chunker.training">Chunker Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.chunker.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.chunker.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.chunker.evaluation">Chunker Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.chunker.evaluation.tool">Chunker Evaluation Tool</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.parser">9. Parser</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.parsing">Parsing</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.parsing.cmdline">Parser Tool</a></span></dt><dt><span class="section"><a href="#tools.parser.parsing.api">Parsing API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.parser.training">Parser Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.
 training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.parser.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.parser.evaluation">Parser Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.parser.evaluation.tool">Parser Evaluation Tool</a></span></dt><dt><span class="section"><a href="#tools.parser.evaluation.api">Evaluation API</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#tools.coref">10. Coreference Resolution</a></span></dt><dt><span class="chapter"><a href="#tools.extension">11. Extending OpenNLP</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.extension.writing">Writing an extension</a></span></dt><dt><span class="section"><a href="#tools.extension.osgi">Running in an OSGi container</a></span></dt></dl></dd><dt><span class="chapter"><a href="#tools.corpora">12. Corpora</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.corpo
 ra.conll">CONLL</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.corpora.conll.2000">CONLL 2000</a></span></dt><dt><span class="section"><a href="#tools.corpora.conll.2002">CONLL 2002</a></span></dt><dt><span class="section"><a href="#tools.corpora.conll.2003">CONLL 2003</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.corpora.arvores-deitadas">Arvores Deitadas</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.corpora.arvores-deitadas.getting">Getting the data</a></span></dt><dt><span class="section"><a href="#tools.corpora.arvores-deitadas.converting">Converting the data (optional)</a></span></dt><dt><span class="section"><a href="#tools.corpora.arvores-deitadas.evaluation">Training and Evaluation</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.corpora.leipzig">Leipzig Corpora</a></span></dt><dt><span class="section"><a href="#tools.corpora.ontonotes">OntoNotes Release 4.0</a></span></dt><dd><dl><dt><span class="se
 ction"><a href="#tools.corpora.ontonotes.namefinder">Name Finder Training</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.corpora.brat">Brat Format Support</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.corpora.brat.webtool">Sentences and Tokens</a></span></dt><dt><span class="section"><a href="#tools.corpora.brat.training">Training</a></span></dt><dt><span class="section"><a href="#tools.corpora.brat.evaluation">Evaluation</a></span></dt><dt><span class="section"><a href="#tools.corpora.brat.cross-validation">Cross Validation</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#opennlp.ml">13. Machine Learning</a></span></dt><dd><dl><dt><span class="section"><a href="#opennlp.ml.maxent">Maximum Entropy</a></span></dt><dd><dl><dt><span class="section"><a href="#opennlp.ml.maxent.impl">Implementation</a></span></dt></dl></dd></dl></dd><dt><span class="chapter"><a href="#org.apche.opennlp.uima">14. UIMA Integration</a></span></dt>
 <dd><dl><dt><span class="section"><a href="#org.apche.opennlp.running-pear-sample">Running the pear sample in CVD</a></span></dt><dt><span class="section"><a href="#org.apche.opennlp.further-help">Further Help</a></span></dt></dl></dd><dt><span class="chapter"><a href="#tools.morfologik-addon">15. Morfologik Addon</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.morfologik-addon.api">Morfologik Integration</a></span></dt><dt><span class="section"><a href="#tools.morfologik-addon.cmdline">Morfologik CLI Tools</a></span></dt></dl></dd><dt><span class="chapter"><a href="#tools.cli">16. The Command Line Interface</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.doccat">Doccat</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.doccat.Doccat">Doccat</a></span></dt><dt><span class="section"><a href="#tools.cli.doccat.DoccatTrainer">DoccatTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.doccat.DoccatEvaluator">DoccatE
 valuator</a></span></dt><dt><span class="section"><a href="#tools.cli.doccat.DoccatCrossValidator">DoccatCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.doccat.DoccatConverter">DoccatConverter</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.dictionary">Dictionary</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.dictionary.DictionaryBuilder">DictionaryBuilder</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.tokenizer">Tokenizer</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.tokenizer.SimpleTokenizer">SimpleTokenizer</a></span></dt><dt><span class="section"><a href="#tools.cli.tokenizer.TokenizerME">TokenizerME</a></span></dt><dt><span class="section"><a href="#tools.cli.tokenizer.TokenizerTrainer">TokenizerTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.tokenizer.TokenizerMEEvaluator">TokenizerMEEvaluator</a></span></dt><dt><span class="section"><a h
 ref="#tools.cli.tokenizer.TokenizerCrossValidator">TokenizerCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.tokenizer.TokenizerConverter">TokenizerConverter</a></span></dt><dt><span class="section"><a href="#tools.cli.tokenizer.DictionaryDetokenizer">DictionaryDetokenizer</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.sentdetect">Sentdetect</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.sentdetect.SentenceDetector">SentenceDetector</a></span></dt><dt><span class="section"><a href="#tools.cli.sentdetect.SentenceDetectorTrainer">SentenceDetectorTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.sentdetect.SentenceDetectorEvaluator">SentenceDetectorEvaluator</a></span></dt><dt><span class="section"><a href="#tools.cli.sentdetect.SentenceDetectorCrossValidator">SentenceDetectorCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.sentdetect.SentenceDetectorConverter">Sentenc
 eDetectorConverter</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.namefind">Namefind</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.namefind.TokenNameFinder">TokenNameFinder</a></span></dt><dt><span class="section"><a href="#tools.cli.namefind.TokenNameFinderTrainer">TokenNameFinderTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.namefind.TokenNameFinderEvaluator">TokenNameFinderEvaluator</a></span></dt><dt><span class="section"><a href="#tools.cli.namefind.TokenNameFinderCrossValidator">TokenNameFinderCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.namefind.TokenNameFinderConverter">TokenNameFinderConverter</a></span></dt><dt><span class="section"><a href="#tools.cli.namefind.CensusDictionaryCreator">CensusDictionaryCreator</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.postag">Postag</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.postag.POSTag
 ger">POSTagger</a></span></dt><dt><span class="section"><a href="#tools.cli.postag.POSTaggerTrainer">POSTaggerTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.postag.POSTaggerEvaluator">POSTaggerEvaluator</a></span></dt><dt><span class="section"><a href="#tools.cli.postag.POSTaggerCrossValidator">POSTaggerCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.postag.POSTaggerConverter">POSTaggerConverter</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.lemmatizer">Lemmatizer</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.lemmatizer.LemmatizerME">LemmatizerME</a></span></dt><dt><span class="section"><a href="#tools.cli.lemmatizer.LemmatizerTrainerME">LemmatizerTrainerME</a></span></dt><dt><span class="section"><a href="#tools.cli.lemmatizer.LemmatizerEvaluator">LemmatizerEvaluator</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.chunker">Chunker</a></span></dt><dd><dl><dt><span 
 class="section"><a href="#tools.cli.chunker.ChunkerME">ChunkerME</a></span></dt><dt><span class="section"><a href="#tools.cli.chunker.ChunkerTrainerME">ChunkerTrainerME</a></span></dt><dt><span class="section"><a href="#tools.cli.chunker.ChunkerEvaluator">ChunkerEvaluator</a></span></dt><dt><span class="section"><a href="#tools.cli.chunker.ChunkerCrossValidator">ChunkerCrossValidator</a></span></dt><dt><span class="section"><a href="#tools.cli.chunker.ChunkerConverter">ChunkerConverter</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.parser">Parser</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.parser.Parser">Parser</a></span></dt><dt><span class="section"><a href="#tools.cli.parser.ParserTrainer">ParserTrainer</a></span></dt><dt><span class="section"><a href="#tools.cli.parser.ParserEvaluator">ParserEvaluator</a></span></dt><dt><span class="section"><a href="#tools.cli.parser.ParserConverter">ParserConverter</a></span></dt><dt><span class
 ="section"><a href="#tools.cli.parser.BuildModelUpdater">BuildModelUpdater</a></span></dt><dt><span class="section"><a href="#tools.cli.parser.CheckModelUpdater">CheckModelUpdater</a></span></dt><dt><span class="section"><a href="#tools.cli.parser.TaggerModelReplacer">TaggerModelReplacer</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.entitylinker">Entitylinker</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.entitylinker.EntityLinker">EntityLinker</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.cli.languagemodel">Languagemodel</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.cli.languagemodel.LanguageModel">LanguageModel</a></span></dt></dl></dd></dl></dd></dl></div><div class="list-of-tables"><p><b>List of Tables</b></p><dl><dt>4.1. <a href="#d4e278">Generator elements</a></dt></dl></div>
-	
-
-	
-	
-	<div class="chapter" title="Chapter&nbsp;1.&nbsp;Introduction"><div class="titlepage"><div><div><h2 class="title"><a name="opennlp"></a>Chapter&nbsp;1.&nbsp;Introduction</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#intro.description">Description</a></span></dt><dt><span class="section"><a href="#intro.general.library.structure">General Library Structure</a></span></dt><dt><span class="section"><a href="#intro.api">Application Program Interface (API). Generic Example</a></span></dt><dt><span class="section"><a href="#intro.cli">Command line interface (CLI)</a></span></dt><dd><dl><dt><span class="section"><a href="#intro.cli.description">Description</a></span></dt><dt><span class="section"><a href="#intro.cli.toolslist">List of tools</a></span></dt><dt><span class="section"><a href="#intro.cli.setup">Setting up</a></span></dt><dt><span class="section"><a href="#intro.cli.generic">Generic Example</a></span></dt></dl></dd
 ></dl></div>
-
-    <div class="section" title="Description"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro.description"></a>Description</h2></div></div></div>
-        
-        <p>
-        The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
-        It supports the most common NLP tasks, such as tokenization, sentence segmentation,
-        part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
-        These tasks are usually required to build more advanced text processing services.
-        OpenNLP also included maximum entropy and perceptron based machine learning.
-        </p>
-
-        <p>
-        The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks.
-        An additional goal is to provide a large number of pre-built models for a variety of languages, as
-        well as the annotated text resources that those models are derived from.
-        </p>
-    </div>
-
-    <div class="section" title="General Library Structure"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro.general.library.structure"></a>General Library Structure</h2></div></div></div>
-        
-        <p>The Apache OpenNLP library contains several components, enabling one to build
-            a full natural language processing pipeline. These components
-            include: sentence detector, tokenizer,
-            name finder, document categorizer, part-of-speech tagger, chunker, parser,
-            coreference resolution. Components contain parts which enable one to execute the
-            respective natural language processing task, to train a model and often also to evaluate a
-            model. Each of these facilities is accessible via its application program
-            interface (API). In addition, a command line interface (CLI) is provided for convenience
-            of experiments and training.
-        </p>
-    </div>
-
-    <div class="section" title="Application Program Interface (API). Generic Example"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro.api"></a>Application Program Interface (API). Generic Example</h2></div></div></div>
-        
-        <p>
-            OpenNLP components have similar APIs. Normally, to execute a task,
-            one should provide a model and an input.
-        </p>
-        <p>
-            A model is usually loaded by providing a FileInputStream with a model to a
-            constructor of the model class:
-            </p><pre class="programlisting">
-                    
-InputStream modelIn = <b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"lang-model-name.bin"</i></b>);
-
-<b class="hl-keyword">try</b> {
-  SomeModel model = <b class="hl-keyword">new</b> SomeModel(modelIn);
-}
-<b class="hl-keyword">catch</b> (IOException e) {
-  <i class="hl-comment" style="color: silver">//handle the exception</i>
-}
-<b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (null != modelIn) {
-    <b class="hl-keyword">try</b> {
-      modelIn.close();
-    }
-    <b class="hl-keyword">catch</b> (IOException e) {
-    }
-  }
-}
-            </pre><p>
-        </p>
-        <p>
-        After the model is loaded the tool itself can be instantiated.
-        </p><pre class="programlisting">
-                
-ToolName toolName = <b class="hl-keyword">new</b> ToolName(model);
-        </pre><p>
-        After the tool is instantiated, the processing task can be executed. The input and the
-        output formats are specific to the tool, but often the output is an array of String,
-        and the input is a String or an array of String.
-        </p><pre class="programlisting">
-                
-String output[] = toolName.executeTask(<b class="hl-string"><i style="color:red">"This is a sample text."</i></b>);
-        </pre><p>
-        </p>
-    </div>
-
-    <div class="section" title="Command line interface (CLI)"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro.cli"></a>Command line interface (CLI)</h2></div></div></div>
-        
-        <div class="section" title="Description"><div class="titlepage"><div><div><h3 class="title"><a name="intro.cli.description"></a>Description</h3></div></div></div>
-            
-            <p>
-                OpenNLP provides a command line script, serving as a unique entry point to all
-                included tools. The script is located in the bin directory of OpenNLP binary
-                distribution. Included are versions for Windows: opennlp.bat and Linux or
-                compatible systems: opennlp.
-            </p>
-        </div>
-        
-        <div class="section" title="List of tools"><div class="titlepage"><div><div><h3 class="title"><a name="intro.cli.toolslist"></a>List of tools</h3></div></div></div>
-            
-            <p>
-               	The list of command line tools for Apache OpenNLP 1.7.2,
-               	as well as a description of its arguments, is available at section <a class="xref" href="#tools.cli" title="Chapter&nbsp;16.&nbsp;The Command Line Interface">Chapter&nbsp;16, <i>The Command Line Interface</i></a>.
-            </p>
-        </div>
-
-        <div class="section" title="Setting up"><div class="titlepage"><div><div><h3 class="title"><a name="intro.cli.setup"></a>Setting up</h3></div></div></div>
-            
-            <p>
-                OpenNLP script uses JAVA_CMD and JAVA_HOME variables to determine which command to
-                use to execute Java virtual machine.
-            </p>
-            <p>
-                OpenNLP script uses OPENNLP_HOME variable to determine the location of the binary
-                distribution of OpenNLP. It is recommended to point this variable to the binary
-                distribution of current OpenNLP version and update PATH variable to include
-                $OPENNLP_HOME/bin or %OPENNLP_HOME%\bin.
-            </p>
-            <p>
-                Such configuration allows calling OpenNLP conveniently. Examples below
-                suppose this configuration has been done.
-            </p>
-        </div>
-
-        <div class="section" title="Generic Example"><div class="titlepage"><div><div><h3 class="title"><a name="intro.cli.generic"></a>Generic Example</h3></div></div></div>
-            
-
-            <p>
-                Apache OpenNLP provides a common command line script to access all its tools:
-                </p><pre class="screen">
-                
-$ opennlp
-                 </pre><p>
-                This script prints current version of the library and lists all available tools:
-                </p><pre class="screen">
-                
-OpenNLP &lt;VERSION&gt;. Usage: opennlp TOOL
-where TOOL is one of:
-  Doccat                            learnable document categorizer
-  DoccatTrainer                     trainer for the learnable document categorizer
-  DoccatConverter                   converts leipzig data format to native OpenNLP format
-  DictionaryBuilder                 builds a new dictionary
-  SimpleTokenizer                   character class tokenizer
-  TokenizerME                       learnable tokenizer
-  TokenizerTrainer                  trainer for the learnable tokenizer
-  TokenizerMEEvaluator              evaluator for the learnable tokenizer
-  TokenizerCrossValidator           K-fold cross validator for the learnable tokenizer
-  TokenizerConverter                converts foreign data formats (namefinder,conllx,pos) to native OpenNLP format
-  DictionaryDetokenizer
-  SentenceDetector                  learnable sentence detector
-  SentenceDetectorTrainer           trainer for the learnable sentence detector
-  SentenceDetectorEvaluator         evaluator for the learnable sentence detector
-  SentenceDetectorCrossValidator    K-fold cross validator for the learnable sentence detector
-  SentenceDetectorConverter         converts foreign data formats (namefinder,conllx,pos) to native OpenNLP format
-  TokenNameFinder                   learnable name finder
-  TokenNameFinderTrainer            trainer for the learnable name finder
-  TokenNameFinderEvaluator          Measures the performance of the NameFinder model with the reference data
-  TokenNameFinderCrossValidator     K-fold cross validator for the learnable Name Finder
-  TokenNameFinderConverter          converts foreign data formats (bionlp2004,conll03,conll02,ad) to native OpenNLP format
-  CensusDictionaryCreator           Converts 1990 US Census names into a dictionary
-  POSTagger                         learnable part of speech tagger
-  POSTaggerTrainer                  trains a model for the part-of-speech tagger
-  POSTaggerEvaluator                Measures the performance of the POS tagger model with the reference data
-  POSTaggerCrossValidator           K-fold cross validator for the learnable POS tagger
-  POSTaggerConverter                converts conllx data format to native OpenNLP format
-  ChunkerME                         learnable chunker
-  ChunkerTrainerME                  trainer for the learnable chunker
-  ChunkerEvaluator                  Measures the performance of the Chunker model with the reference data
-  ChunkerCrossValidator             K-fold cross validator for the chunker
-  ChunkerConverter                  converts ad data format to native OpenNLP format
-  Parser                            performs full syntactic parsing
-  ParserTrainer                     trains the learnable parser
-  ParserEvaluator					Measures the performance of the Parser model with the reference data
-  BuildModelUpdater                 trains and updates the build model in a parser model
-  CheckModelUpdater                 trains and updates the check model in a parser model
-  TaggerModelReplacer               replaces the tagger model in a parser model
-All tools print help when invoked with help parameter
-Example: opennlp SimpleTokenizer help
-
-                </pre><p>
-            </p>
-            <p>OpenNLP tools have similar command line structure and options. To discover tool
-                options, run it with no parameters:
-                </p><pre class="screen">
-                
-$ opennlp ToolName
-                 </pre><p>
-                The tool will output two blocks of help.
-            </p>
-            <p>
-                The first block describes the general structure of this tool command line:
-                </p><pre class="screen">
-                
-Usage: opennlp TokenizerTrainer[.namefinder|.conllx|.pos] [-abbDict path] ...  -model modelFile ...
-                </pre><p>
-                The general structure of this tool command line includes the obligatory tool name
-                (TokenizerTrainer), the optional format parameters ([.namefinder|.conllx|.pos]),
-                the optional parameters ([-abbDict path] ...), and the obligatory parameters
-                (-model modelFile ...).
-            </p>
-            <p>
-                The format parameters enable direct processing of non-native data without conversion.
-                Each format might have its own parameters, which are displayed if the tool is
-                executed without or with help parameter:
-                </p><pre class="screen">
-                
-$ opennlp TokenizerTrainer.conllx help
-                </pre><p>
-                </p><pre class="screen">
-                
-Usage: opennlp TokenizerTrainer.conllx [-abbDict path] [-alphaNumOpt isAlphaNumOpt] ...
-
-Arguments description:
-        -abbDict path
-                abbreviation dictionary in XML format.
-        ...
-                </pre><p>
-                To switch the tool to a specific format, add a dot and the format name after
-                the tool name:
-                </p><pre class="screen">
-                
-$ opennlp TokenizerTrainer.conllx -model en-pos.bin ...
-                </pre><p>
-            </p>
-            <p>
-                The second block of the help message describes the individual arguments:
-                </p><pre class="screen">
-                
-Arguments description:
-        -type maxent|perceptron|perceptron_sequence
-                The type of the token name finder model. One of maxent|perceptron|perceptron_sequence.
-        -dict dictionaryPath
-                The XML tag dictionary file
-        ...
-                </pre><p>
-            </p>
-            <p>
-                Most tools for processing need to be provided at least a model:
-                </p><pre class="screen">
-                
-$ opennlp ToolName lang-model-name.bin
-                 </pre><p>
-                When tool is executed this way, the model is loaded and the tool is waiting for
-                the input from standard input. This input is processed and printed to standard
-                output.
-            </p>
-            <p>Alternative, or one should say, most commonly used way is to use console input and
-                output redirection options to provide also an input and an output files:
-                </p><pre class="screen">
-            
-$ opennlp ToolName lang-model-name.bin &lt; input.txt &gt; output.txt
-                </pre><p>
-            </p>
-            <p>
-                Most tools for model training need to be provided first a model name,
-                optionally some training options (such as model type, number of iterations),
-                and then the data.
-            </p>
-            <p>
-                A model name is just a file name.
-            </p>
-            <p>
-                Training options often include number of iterations, cutoff,
-                abbreviations dictionary or something else. Sometimes it is possible to provide these
-                options via training options file. In this case these options are ignored and the
-                ones from the file are used.
-            </p>
-            <p>
-                For the data one has to specify the location of the data (filename) and often
-                language and encoding.
-            </p>
-            <p>
-                A generic example of a command line to launch a tool trainer might be:
-                </p><pre class="screen">
-                
-$ opennlp ToolNameTrainer -model en-model-name.bin -lang en -data input.train -encoding UTF-8
-                 </pre><p>
-                or with a format:
-                </p><pre class="screen">
-                
-$ opennlp ToolNameTrainer.conll03 -model en-model-name.bin -lang en -data input.train \
-                                  -types per -encoding UTF-8
-                 </pre><p>
-            </p>
-            <p>Most tools for model evaluation are similar to those for task execution, and
-                need to be provided fist a model name, optionally some evaluation options (such
-                as whether to print misclassified samples), and then the test data. A generic
-                example of a command line to launch an evaluation tool might be:
-                </p><pre class="screen">
-                
-$ opennlp ToolNameEvaluator -model en-model-name.bin -lang en -data input.test -encoding UTF-8
-                 </pre><p>
-            </p>
-        </div>
-    </div>
-
-</div>
-	<div class="chapter" title="Chapter&nbsp;2.&nbsp;Sentence Detector"><div class="titlepage"><div><div><h2 class="title"><a name="tools.sentdetect"></a>Chapter&nbsp;2.&nbsp;Sentence Detector</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#tools.sentdetect.detection">Sentence Detection</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.detection.cmdline">Sentence Detection Tool</a></span></dt><dt><span class="section"><a href="#tools.sentdetect.detection.api">Sentence Detection API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.sentdetect.training">Sentence Detector Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.sentdetect.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.sentdetect.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.sentdetect.eval">Evaluation</a></span></dt>
 <dd><dl><dt><span class="section"><a href="#tools.sentdetect.eval.tool">Evaluation Tool</a></span></dt></dl></dd></dl></div>
-
-	
-
-	<div class="section" title="Sentence Detection"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.sentdetect.detection"></a>Sentence Detection</h2></div></div></div>
-		
-		<p>
-		The OpenNLP Sentence Detector can detect that a punctuation character 
-		marks the end of a sentence or not. In this sense a sentence is defined 
-		as the longest white space trimmed character sequence between two punctuation
-		marks. The first and last sentence make an exception to this rule. The first 
-		non whitespace character is assumed to be the begin of a sentence, and the 
-		last non whitespace character is assumed to be a sentence end.
-		The sample text below should be segmented into its sentences.
-		</p><pre class="screen">
-				
-Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is
-chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years
-old and former chairman of Consolidated Gold Fields PLC, was named a director of this
-British industrial conglomerate.
-		</pre><p>
-		After detecting the sentence boundaries each sentence is written in its own line.
-		</p><pre class="screen">
-				
-Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
-Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
-Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC,
-    was named a director of this British industrial conglomerate.
-		</pre><p>
-		Usually Sentence Detection is done before the text is tokenized and that's the way the pre-trained models on the web site are trained,
-		but it is also possible to perform tokenization first and let the Sentence Detector process the already tokenized text.
-		The OpenNLP Sentence Detector cannot identify sentence boundaries based on the contents of the sentence. A prominent example is the first sentence in an article where the title is mistakenly identified to be the first part of the first sentence.
-		Most components in OpenNLP expect input which is segmented into sentences.
-		</p>
-		
-		<div class="section" title="Sentence Detection Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.sentdetect.detection.cmdline"></a>Sentence Detection Tool</h3></div></div></div>
-		
-		<p>
-		The easiest way to try out the Sentence Detector is the command line tool. The tool is only intended for demonstration and testing.
-		Download the english sentence detector model and start the Sentence Detector Tool with this command:
-        </p><pre class="screen">
-        
-$ opennlp SentenceDetector en-sent.bin
-		</pre><p>
-		Just copy the sample text from above to the console. The Sentence Detector will read it and echo one sentence per line to the console.
-		Usually the input is read from a file and the output is redirected to another file. This can be achieved with the following command.
-		</p><pre class="screen">
-				
-$ opennlp SentenceDetector en-sent.bin &lt; input.txt &gt; output.txt
-		</pre><p>
-		For the english sentence model from the website the input text should not be tokenized.
-		</p>
-		</div>
-		<div class="section" title="Sentence Detection API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.sentdetect.detection.api"></a>Sentence Detection API</h3></div></div></div>
-		
-		<p>
-		The Sentence Detector can be easily integrated into an application via its API.
-		To instantiate the Sentence Detector the sentence model must be loaded first.
-		</p><pre class="programlisting">
-				
-InputStream modelIn = <b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-sent.bin"</i></b>);
-
-<b class="hl-keyword">try</b> {
-  SentenceModel model = <b class="hl-keyword">new</b> SentenceModel(modelIn);
-}
-<b class="hl-keyword">catch</b> (IOException e) {
-  e.printStackTrace();
-}
-<b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelIn != null) {
-    <b class="hl-keyword">try</b> {
-      modelIn.close();
-    }
-    <b class="hl-keyword">catch</b> (IOException e) {
-    }
-  }
-}
-		</pre><p>
-		After the model is loaded the SentenceDetectorME can be instantiated.
-		</p><pre class="programlisting">
-				
-SentenceDetectorME sentenceDetector = <b class="hl-keyword">new</b> SentenceDetectorME(model);
-		</pre><p>
-		The Sentence Detector can output an array of Strings, where each String is one sentence.
-				</p><pre class="programlisting">
-				
-String sentences[] = sentenceDetector.sentDetect(<b class="hl-string"><i style="color:red">"  First sentence. Second sentence. "</i></b>);
-		</pre><p>
-		The result array now contains two entries. The first String is "First sentence." and the
-        second String is "Second sentence." The whitespace before, between and after the input String is removed.
-		The API also offers a method which simply returns the span of the sentence in the input string.
-		</p><pre class="programlisting">
-				
-Span sentences[] = sentenceDetector.sentPosDetect(<b class="hl-string"><i style="color:red">"  First sentence. Second sentence. "</i></b>);
-		</pre><p>
-		The result array again contains two entries. The first span beings at index 2 and ends at
-            17. The second span begins at 18 and ends at 34. The utility method Span.getCoveredText can be used to create a substring which only covers the chars in the span.
-		</p>
-		</div>
-	</div>
-	<div class="section" title="Sentence Detector Training"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.sentdetect.training"></a>Sentence Detector Training</h2></div></div></div>
-		
-		<p></p>
-		<div class="section" title="Training Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.sentdetect.training.tool"></a>Training Tool</h3></div></div></div>
-		
-		<p>
-		OpenNLP has a command line tool which is used to train the models available from the model
-		download page on various corpora. The data must be converted to the OpenNLP Sentence Detector
-		training format. Which is one sentence per line. An empty line indicates a document boundary.
-		In case the document boundary is unknown, its recommended to have an empty line every few ten
-		sentences. Exactly like the output in the sample above.
-		Usage of the tool:
-		</p><pre class="screen">
-				
-$ opennlp SentenceDetectorTrainer
-Usage: opennlp SentenceDetectorTrainer[.namefinder|.conllx|.pos] [-abbDict path] \
-               [-params paramsFile] [-iterations num] [-cutoff num] -model modelFile \
-               -lang language -data sampleData [-encoding charsetName]
-
-Arguments description:
-        -abbDict path
-                abbreviation dictionary in XML format.
-        -params paramsFile
-                training parameters file.
-        -iterations num
-                number of training iterations, ignored if -params is used.
-        -cutoff num
-                minimal number of times a feature must be seen, ignored if -params is used.
-        -model modelFile
-                output model file.
-        -lang language
-                language which is being processed.
-        -data sampleData
-                data to be used, usually a file name.
-        -encoding charsetName
-                encoding for reading and writing text, if absent the system default is used.
-	</pre><p>
-		To train an English sentence detector use the following command:
-        </p><pre class="screen">
-				
-$ opennlp SentenceDetectorTrainer -model en-sent.bin -lang en -data en-sent.train -encoding UTF-8
-                        
-        </pre><p>
-            It should produce the following output:
-            </p><pre class="screen">
-                
-Indexing events using cutoff of 5
-
-	Computing event counts...  done. 4883 events
-	Indexing...  done.
-Sorting and merging events... done. Reduced 4883 events to 2945.
-Done indexing.
-Incorporating indexed data for training...  
-done.
-	Number of Event Tokens: 2945
-	    Number of Outcomes: 2
-	  Number of Predicates: 467
-...done.
-Computing model parameters...
-Performing 100 iterations.
-  1:  .. loglikelihood=-3384.6376826743144	0.38951464263772273
-  2:  .. loglikelihood=-2191.9266688597672	0.9397911120212984
-  3:  .. loglikelihood=-1645.8640771555981	0.9643661683391358
-  4:  .. loglikelihood=-1340.386303774519	0.9739913987302887
-  5:  .. loglikelihood=-1148.4141548519624	0.9748105672742167
-
- ...&lt;skipping a bunch of iterations&gt;...
-
- 95:  .. loglikelihood=-288.25556805874436	0.9834118369854598
- 96:  .. loglikelihood=-287.2283680343481	0.9834118369854598
- 97:  .. loglikelihood=-286.2174830344526	0.9834118369854598
- 98:  .. loglikelihood=-285.222486981048	0.9834118369854598
- 99:  .. loglikelihood=-284.24296917223916	0.9834118369854598
-100:  .. loglikelihood=-283.2785335773966	0.9834118369854598
-Wrote sentence detector model.
-Path: en-sent.bin
-
-		</pre><p>
-		</p>
-		</div>
-		<div class="section" title="Training API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.sentdetect.training.api"></a>Training API</h3></div></div></div>
-		
-		<p>
-		The Sentence Detector also offers an API to train a new sentence detection model.
-		Basically three steps are necessary to train it:
-		</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-					<p>The application must open a sample data stream</p>
-				</li><li class="listitem">
-					<p>Call the SentenceDetectorME.train method</p>
-				</li><li class="listitem">
-					<p>Save the SentenceModel to a file or directly use it</p>
-				</li></ul></div><p>
-			The following sample code illustrates these steps:
-					</p><pre class="programlisting">
-				
-Charset charset = Charset.forName(<b class="hl-string"><i style="color:red">"UTF-8"</i></b>);				
-ObjectStream&lt;String&gt; lineStream =
-  <b class="hl-keyword">new</b> PlainTextByLineStream(<b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-sent.train"</i></b>), charset);
-ObjectStream&lt;SentenceSample&gt; sampleStream = <b class="hl-keyword">new</b> SentenceSampleStream(lineStream);
-
-SentenceModel model;
-
-<b class="hl-keyword">try</b> {
-  model = SentenceDetectorME.train(<b class="hl-string"><i style="color:red">"en"</i></b>, sampleStream, true, null, TrainingParameters.defaultParams());
-}
-<b class="hl-keyword">finally</b> {
-  sampleStream.close();
-}
-
-OutputStream modelOut = null;
-<b class="hl-keyword">try</b> {
-  modelOut = <b class="hl-keyword">new</b> BufferedOutputStream(<b class="hl-keyword">new</b> FileOutputStream(modelFile));
-  model.serialize(modelOut);
-} <b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelOut != null) 
-     modelOut.close();      
-}
-		</pre><p>
-		</p>
-		</div>
-	</div>
-	<div class="section" title="Evaluation"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.sentdetect.eval"></a>Evaluation</h2></div></div></div>
-		
-		<p>
-		</p>
-		<div class="section" title="Evaluation Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.sentdetect.eval.tool"></a>Evaluation Tool</h3></div></div></div>
-			
-			<p>
-                The command shows how the evaluator tool can be run:
-                </p><pre class="screen">
-				
-$ opennlp SentenceDetectorEvaluator -model en-sent.bin -data en-sent.eval -encoding UTF-8
-
-Loading model ... done
-Evaluating ... done
-
-Precision: 0.9465737514518002
-Recall: 0.9095982142857143
-F-Measure: 0.9277177006260672
-                </pre><p>
-                The en-sent.eval file has the same format as the training data.
-			</p>
-		</div>
-	</div>
-</div>
-	<div class="chapter" title="Chapter&nbsp;3.&nbsp;Tokenizer"><div class="titlepage"><div><div><h2 class="title"><a name="tools.tokenizer"></a>Chapter&nbsp;3.&nbsp;Tokenizer</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#tools.tokenizer.introduction">Tokenization</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.cmdline">Tokenizer Tools</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.api">Tokenizer API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.tokenizer.training">Tokenizer Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.training.api">Training API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.tokenizer.detokenizing">Detokenizing</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.tokenizer.de
 tokenizing.api">Detokenizing API</a></span></dt><dt><span class="section"><a href="#tools.tokenizer.detokenizing.dict">Detokenizer Dictionary</a></span></dt></dl></dd></dl></div>
-
-	
-
-	<div class="section" title="Tokenization"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.tokenizer.introduction"></a>Tokenization</h2></div></div></div>
-		
-		<p>
-			The OpenNLP Tokenizers segment an input character sequence into
-			tokens. Tokens are usually
-			words, punctuation, numbers, etc.
-
-			</p><pre class="screen">
-			
-Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
-Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
-Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields
-    PLC, was named a director of this British industrial conglomerate.
-			
-		    </pre><p>
-
-			The following result shows the individual tokens in a whitespace
-			separated representation.
-
-			</p><pre class="screen">
-			
-Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
-Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
-Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC ,
-    was named a nonexecutive director of this British industrial conglomerate . 
-A form of asbestos once used to make Kent cigarette filters has caused a high
-    percentage of cancer deaths among a group of workers exposed to it more than 30 years ago ,
-    researchers reported . 
-			
-		 	</pre><p>
-
-			OpenNLP offers multiple tokenizer implementations:
-			</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-					<p>Whitespace Tokenizer - A whitespace tokenizer, non whitespace
-						sequences are identified as tokens</p>
-				</li><li class="listitem">
-					<p>Simple Tokenizer - A character class tokenizer, sequences of
-						the same character class are tokens</p>
-				</li><li class="listitem">
-					<p>Learnable Tokenizer - A maximum entropy tokenizer, detects
-						token boundaries based on probability model</p>
-				</li></ul></div><p>
-
-			Most part-of-speech taggers, parsers and so on, work with text
-			tokenized in this manner. It is important to ensure that your
-			tokenizer
-			produces tokens of the type expected by your later text
-			processing
-			components.
-		</p>
-
-		<p>
-			With OpenNLP (as with many systems), tokenization is a two-stage
-			process:
-			first, sentence boundaries are identified, then tokens within
-			each
-			sentence are identified.
-		</p>
-	
-	<div class="section" title="Tokenizer Tools"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.cmdline"></a>Tokenizer Tools</h3></div></div></div>
-		
-		<p>The easiest way to try out the tokenizers are the command line
-			tools. The tools are only intended for demonstration and testing.
-		</p>
-		<p>There are two tools, one for the Simple Tokenizer and one for
-			the learnable tokenizer. A command line tool the for the Whitespace
-			Tokenizer does not exist, because the whitespace separated output
-			would be identical to the input.</p>
-		<p>
-			The following command shows how to use the Simple Tokenizer Tool.
-
-			</p><pre class="screen">
-			
-$ opennlp SimpleTokenizer
-		    </pre><p>
-			To use the learnable tokenizer download the english token model from
-			our website.
-			</p><pre class="screen">
-			
-$ opennlp TokenizerME en-token.bin
-		    </pre><p>
-			To test the tokenizer copy the sample from above to the console. The
-			whitespace separated tokens will be written back to the
-			console.
-		</p>
-		<p>
-			Usually the input is read from a file and written to a file.
-			</p><pre class="screen">
-			
-$ opennlp TokenizerME en-token.bin &lt; article.txt &gt; article-tokenized.txt
-		    </pre><p>
-			It can be done in the same way for the Simple Tokenizer.
-		</p>
-		<p>
-			Since most text comes truly raw and doesn't have sentence boundaries
-			and such, its possible to create a pipe which first performs sentence
-			boundary detection and tokenization. The following sample illustrates
-			that.
-			</p><pre class="screen">
-			
-$ opennlp SentenceDetector sentdetect.model &lt; article.txt | opennlp TokenizerME tokenize.model | more
-Loading model ... Loading model ... done
-done
-Showa Shell gained 20 to 1,570 and Mitsubishi Oil rose 50 to 1,500.
-Sumitomo Metal Mining fell five yen to 692 and Nippon Mining added 15 to 960 .
-Among other winners Wednesday was Nippon Shokubai , which was up 80 at 2,410 .
-Marubeni advanced 11 to 890 .
-London share prices were bolstered largely by continued gains on Wall Street and technical 
-    factors affecting demand for London 's blue-chip stocks .
-...etc...
-		 </pre><p>
-			Of course this is all on the command line. Many people use the models
-			directly in their Java code by creating SentenceDetector and
-			Tokenizer objects and calling their methods as appropriate. The
-			following section will explain how the Tokenizers can be used
-			directly from java.
-		</p>
-	</div>
-
-	<div class="section" title="Tokenizer API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.api"></a>Tokenizer API</h3></div></div></div>
-		
-		<p>
-			The Tokenizers can be integrated into an application by the defined
-			API.
-			The shared instance of the WhitespaceTokenizer can be retrieved from a
-			static field WhitespaceTokenizer.INSTANCE. The shared instance of the
-			SimpleTokenizer can be retrieved in the same way from
-			SimpleTokenizer.INSTANCE.
-			To instantiate the TokenizerME (the learnable tokenizer) a Token Model
-			must be created first. The following code sample shows how a model
-			can be loaded.
-			</p><pre class="programlisting">
-			
-InputStream modelIn = <b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-token.bin"</i></b>);
-
-<b class="hl-keyword">try</b> {
-  TokenizerModel model = <b class="hl-keyword">new</b> TokenizerModel(modelIn);
-}
-<b class="hl-keyword">catch</b> (IOException e) {
-  e.printStackTrace();
-}
-<b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelIn != null) {
-    <b class="hl-keyword">try</b> {
-      modelIn.close();
-    }
-    <b class="hl-keyword">catch</b> (IOException e) {
-    }
-  }
-}
-		 </pre><p>
-			After the model is loaded the TokenizerME can be instantiated.
-			</p><pre class="programlisting">
-			
-Tokenizer tokenizer = <b class="hl-keyword">new</b> TokenizerME(model);
-		 </pre><p>
-			The tokenizer offers two tokenize methods, both expect an input
-			String object which contains the untokenized text. If possible it
-			should be a sentence, but depending on the training of the learnable
-			tokenizer this is not required. The first returns an array of
-			Strings, where each String is one token.
-			</p><pre class="programlisting">
-			
-String tokens[] = tokenizer.tokenize(<b class="hl-string"><i style="color:red">"An input sample sentence."</i></b>);
-		 </pre><p>
-			The output will be an array with these tokens.
-			</p><pre class="programlisting">
-			
-"An", "input", "sample", "sentence", "."
-		 </pre><p>
-			The second method, tokenizePos returns an array of Spans, each Span
-			contain the begin and end character offsets of the token in the input
-			String.
-			</p><pre class="programlisting">
-			
-Span tokenSpans[] = tokenizer.tokenizePos(<b class="hl-string"><i style="color:red">"An input sample sentence."</i></b>);		
-			</pre><p>
-			The tokenSpans array now contain 5 elements. To get the text for one
-			span call Span.getCoveredText which takes a span and the input text.
-
-			The TokenizerME is able to output the probabilities for the detected
-			tokens. The getTokenProbabilities method must be called directly
-			after one of the tokenize methods was called.
-			</p><pre class="programlisting">
-			
-TokenizerME tokenizer = ...
-
-String tokens[] = tokenizer.tokenize(...);
-<b class="hl-keyword">double</b> tokenProbs[] = tokenizer.getTokenProbabilities();
-			</pre><p>
-			The tokenProbs array now contains one double value per token, the
-			value is between 0 and 1, where 1 is the highest possible probability
-			and 0 the lowest possible probability.
-		</p>
-	</div>
-	</div>
-	
-	<div class="section" title="Tokenizer Training"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.tokenizer.training"></a>Tokenizer Training</h2></div></div></div>
-		
-			
-		<div class="section" title="Training Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.training.tool"></a>Training Tool</h3></div></div></div>
-			
-			<p>
-				OpenNLP has a command line tool which is used to train the models
-				available from the model download page on various corpora. The data
-				can be converted to the OpenNLP Tokenizer training format or used directly.
-                The OpenNLP format contains one sentence per line. Tokens are either separated by a
-                whitespace or by a special &lt;SPLIT&gt; tag.
-				
-				The following sample shows the sample from above in the correct format.
-				</p><pre class="screen">
-			    
-Pierre Vinken&lt;SPLIT&gt;, 61 years old&lt;SPLIT&gt;, will join the board as a nonexecutive director Nov. 29&lt;SPLIT&gt;.
-Mr. Vinken is chairman of Elsevier N.V.&lt;SPLIT&gt;, the Dutch publishing group&lt;SPLIT&gt;.
-Rudolph Agnew&lt;SPLIT&gt;, 55 years old and former chairman of Consolidated Gold Fields PLC&lt;SPLIT&gt;,
-    was named a nonexecutive director of this British industrial conglomerate&lt;SPLIT&gt;.
-			    </pre><p>
-			    Usage of the tool:
-			    </p><pre class="screen">
-			    
-$ opennlp TokenizerTrainer
-Usage: opennlp TokenizerTrainer[.namefinder|.conllx|.pos] [-abbDict path] \
-                [-alphaNumOpt isAlphaNumOpt] [-params paramsFile] [-iterations num] \
-                [-cutoff num] -model modelFile -lang language -data sampleData \
-                [-encoding charsetName]
-
-Arguments description:
-        -abbDict path
-                abbreviation dictionary in XML format.
-        -alphaNumOpt isAlphaNumOpt
-                Optimization flag to skip alpha numeric tokens for further tokenization
-        -params paramsFile
-                training parameters file.
-        -iterations num
-                number of training iterations, ignored if -params is used.
-        -cutoff num
-                minimal number of times a feature must be seen, ignored if -params is used.
-        -model modelFile
-                output model file.
-        -lang language
-                language which is being processed.
-        -data sampleData
-                data to be used, usually a file name.
-        -encoding charsetName
-                encoding for reading and writing text, if absent the system default is used.
-                </pre><p>
-				To train the english tokenizer use the following command:
-				</p><pre class="screen">
-			    
-$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt -lang en -data en-token.train -encoding UTF-8
-
-Indexing events using cutoff of 5
-
-	Computing event counts...  done. 262271 events
-	Indexing...  done.
-Sorting and merging events... done. Reduced 262271 events to 59060.
-Done indexing.
-Incorporating indexed data for training...  
-done.
-	Number of Event Tokens: 59060
-	    Number of Outcomes: 2
-	  Number of Predicates: 15695
-...done.
-Computing model parameters...
-Performing 100 iterations.
-  1:  .. loglikelihood=-181792.40419263614	0.9614292087192255
-  2:  .. loglikelihood=-34208.094253153664	0.9629238459456059
-  3:  .. loglikelihood=-18784.123872910015	0.9729211388220581
-  4:  .. loglikelihood=-13246.88162585859	0.9856103038460219
-  5:  .. loglikelihood=-10209.262670265718	0.9894422181636552
-
- ...&lt;skipping a bunch of iterations&gt;...
-
- 95:  .. loglikelihood=-769.2107474529454	0.999511955191386
- 96:  .. loglikelihood=-763.8891914534009	0.999511955191386
- 97:  .. loglikelihood=-758.6685383254891	0.9995157680414533
- 98:  .. loglikelihood=-753.5458314695236	0.9995157680414533
- 99:  .. loglikelihood=-748.5182305519613	0.9995157680414533
-100:  .. loglikelihood=-743.5830058068038	0.9995157680414533
-Wrote tokenizer model.
-Path: en-token.bin
-				</pre><p>
-			</p>
-		</div>
-		<div class="section" title="Training API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.training.api"></a>Training API</h3></div></div></div>
-			
-            <p>
-                The Tokenizer offers an API to train a new tokenization model. Basically three steps
-                are necessary to train it:
-                </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-                        <p>The application must open a sample data stream</p>
-                    </li><li class="listitem">
-                        <p>Call the TokenizerME.train method</p>
-                    </li><li class="listitem">
-                        <p>Save the TokenizerModel to a file or directly use it</p>
-                    </li></ul></div><p>
-                The following sample code illustrates these steps:
-                </p><pre class="programlisting">
-                    
-Charset charset = Charset.forName(<b class="hl-string"><i style="color:red">"UTF-8"</i></b>);
-ObjectStream&lt;String&gt; lineStream = <b class="hl-keyword">new</b> PlainTextByLineStream(<b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-sent.train"</i></b>),
-    charset);
-ObjectStream&lt;TokenSample&gt; sampleStream = <b class="hl-keyword">new</b> TokenSampleStream(lineStream);
-
-TokenizerModel model;
-
-<b class="hl-keyword">try</b> {
-  model = TokenizerME.train(<b class="hl-string"><i style="color:red">"en"</i></b>, sampleStream, true, TrainingParameters.defaultParams());
-}
-<b class="hl-keyword">finally</b> {
-  sampleStream.close();
-}
-
-OutputStream modelOut = null;
-<b class="hl-keyword">try</b> {
-  modelOut = <b class="hl-keyword">new</b> BufferedOutputStream(<b class="hl-keyword">new</b> FileOutputStream(modelFile));
-  model.serialize(modelOut);
-} <b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelOut != null)
-     modelOut.close();
-}
-                </pre><p>
-            </p>
-		</div>
-	</div>
-	
-	<div class="section" title="Detokenizing"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.tokenizer.detokenizing"></a>Detokenizing</h2></div></div></div>
-		
-		<p>
-		Detokenizing is simple the opposite of tokenization, the original non-tokenized string should
-		be constructed out of a token sequence. The OpenNLP implementation was created to undo the tokenization
-		of training data for the tokenizer. It can also be used to undo the tokenization of such a trained
-		tokenizer. The implementation is strictly rule based and defines how tokens should be attached
-		to a sentence wise character sequence.
-		</p>
-		<p>
-		The rule dictionary assign to every token an operation which describes how it should be attached
-		to one continuous character sequence.
-		</p>
-		<p>
-		The following rules can be assigned to a token:
-		</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-				<p>MERGE_TO_LEFT - Merges the token to the left side.</p>
-			</li><li class="listitem">
-				<p>MERGE_TO_RIGHT - Merges the token to the right side.</p>
-			</li><li class="listitem">
-				<p>RIGHT_LEFT_MATCHING - Merges the token to the right side on first occurrence
-				and to the left side on second occurrence.</p>
-			</li></ul></div><p>
-
-		The following sample will illustrate how the detokenizer with a small
-		rule dictionary (illustration format, not the xml data format):
-		</p><pre class="programlisting">
-			
-. MERGE_TO_LEFT
-" RIGHT_LEFT_MATCHING		
-		</pre><p>
-		The dictionary should be used to de-tokenize the following whitespace tokenized sentence:
-		</p><pre class="programlisting">
-			
-He said " This is a test " .		
-		</pre><p>
-		The tokens would get these tags based on the dictionary:
-		</p><pre class="programlisting">
-			
-He -&gt; NO_OPERATION
-said -&gt; NO_OPERATION
-" -&gt; MERGE_TO_RIGHT
-This -&gt; NO_OPERATION
-is -&gt; NO_OPERATION
-a -&gt; NO_OPERATION
-test -&gt; NO_OPERATION
-" -&gt; MERGE_TO_LEFT
-. -&gt; MERGE_TO_LEFT		
-			</pre><p>
-			That will result in the following character sequence:
-		</p><pre class="programlisting">
-			
-He said "This is a test".		
-		</pre><p>
-		TODO: Add documentation about the dictionary format and how to use the API. Contributions are welcome.
-		</p>
-		<div class="section" title="Detokenizing API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.detokenizing.api"></a>Detokenizing API</h3></div></div></div>
-			
-			<p>TODO: Write documentation about the detokenizer api. Any contributions
-are very welcome. If you want to contribute please contact us on the mailing list
-or comment on the jira issue <a class="ulink" href="https://issues.apache.org/jira/browse/OPENNLP-216" target="_top">OPENNLP-216</a>.</p>
-		</div>
-		<div class="section" title="Detokenizer Dictionary"><div class="titlepage"><div><div><h3 class="title"><a name="tools.tokenizer.detokenizing.dict"></a>Detokenizer Dictionary</h3></div></div></div>
-			
-			<p>TODO: Write documentation about the detokenizer dictionary. Any contributions
-are very welcome. If you want to contribute please contact us on the mailing list
-or comment on the jira issue <a class="ulink" href="https://issues.apache.org/jira/browse/OPENNLP-217" target="_top">OPENNLP-217</a>.</p>
-		</div>
-	</div>
-</div>
-	<div class="chapter" title="Chapter&nbsp;4.&nbsp;Name Finder"><div class="titlepage"><div><div><h2 class="title"><a name="tools.namefind"></a>Chapter&nbsp;4.&nbsp;Name Finder</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#tools.namefind.recognition">Named Entity Recognition</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.recognition.cmdline">Name Finder Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.recognition.api">Name Finder API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.namefind.training">Name Finder Training</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.training.tool">Training Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.training.api">Training API</a></span></dt><dt><span class="section"><a href="#tools.namefind.training.featuregen">Custom Feature Generation</a></span></dt></dl></dd><dt><s
 pan class="section"><a href="#tools.namefind.eval">Evaluation</a></span></dt><dd><dl><dt><span class="section"><a href="#tools.namefind.eval.tool">Evaluation Tool</a></span></dt><dt><span class="section"><a href="#tools.namefind.eval.api">Evaluation API</a></span></dt></dl></dd><dt><span class="section"><a href="#tools.namefind.annotation_guides">Named Entity Annotation Guidelines</a></span></dt></dl></div>
-
-	
-
-	<div class="section" title="Named Entity Recognition"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.namefind.recognition"></a>Named Entity Recognition</h2></div></div></div>
-		
-		<p>
-			The Name Finder can detect named entities and numbers in text. To be able to
-			detect entities the Name Finder needs a model. The model is dependent on the
-			language and entity type it was trained for. The OpenNLP projects offers a number
-			of pre-trained name finder models which are trained on various freely available corpora.
-			They can be downloaded at our model download page. To find names in raw text the text
-			must be segmented into tokens and sentences. A detailed description is given in the
-			sentence detector and tokenizer tutorial. It is important that the tokenization for
-			the training data and the input text is identical.
-		</p>
-	
-	<div class="section" title="Name Finder Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.namefind.recognition.cmdline"></a>Name Finder Tool</h3></div></div></div>
-		
-		<p>
-			The easiest way to try out the Name Finder is the command line tool.
-			The tool is only intended for demonstration and testing. Download the
-			English
-			person model and start the Name Finder Tool with this command:
-			</p><pre class="screen">
-				
-$ opennlp TokenNameFinder en-ner-person.bin
-			 </pre><p>
-			 
-			The name finder now reads a tokenized sentence per line from stdin, an empty
-			line indicates a document boundary and resets the adaptive feature generators.
-			Just copy this text to the terminal:
-	
-			</p><pre class="screen">
-				
-Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
-Mr . Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
-Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named
-    a director of this British industrial conglomerate .
-			 </pre><p>
-			 the name finder will now output the text with markup for person names:
-			</p><pre class="screen">
-				
-&lt;START:person&gt; Pierre Vinken &lt;END&gt; , 61 years old , will join the board as a nonexecutive director Nov. 29 .
-Mr . &lt;START:person&gt; Vinken &lt;END&gt; is chairman of Elsevier N.V. , the Dutch publishing group .
-&lt;START:person&gt; Rudolph Agnew &lt;END&gt; , 55 years old and former chairman of Consolidated Gold Fields PLC ,
-    was named a director of this British industrial conglomerate .
-			 </pre><p>
-		</p>
-	</div>
-		<div class="section" title="Name Finder API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.namefind.recognition.api"></a>Name Finder API</h3></div></div></div>
-		
-		<p>
-			To use the Name Finder in a production system it is strongly recommended to embed it
-			directly into the application instead of using the command line interface.
-			First the name finder model must be loaded into memory from disk or an other source.
-			In the sample below it is loaded from disk.
-			</p><pre class="programlisting">
-				
-InputStream modelIn = <b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-ner-person.bin"</i></b>);
-
-<b class="hl-keyword">try</b> {
-  TokenNameFinderModel model = <b class="hl-keyword">new</b> TokenNameFinderModel(modelIn);
-}
-<b class="hl-keyword">catch</b> (IOException e) {
-  e.printStackTrace();
-}
-<b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelIn != null) {
-    <b class="hl-keyword">try</b> {
-      modelIn.close();
-    }
-    <b class="hl-keyword">catch</b> (IOException e) {
-    }
-  }
-}
-			 </pre><p>
-			 There is a number of reasons why the model loading can fail:
-			 </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-					<p>Issues with the underlying I/O</p>
-				</li><li class="listitem">
-					<p>The version of the model is not compatible with the OpenNLP version</p>
-				</li><li class="listitem">
-					<p>The model is loaded into the wrong component,
-					for example a tokenizer model is loaded with TokenNameFinderModel class.</p>
-				</li><li class="listitem">
-					<p>The model content is not valid for some other reason</p>
-				</li></ul></div><p>
-			After the model is loaded the NameFinderME can be instantiated.
-			</p><pre class="programlisting">
-				
-NameFinderME nameFinder = <b class="hl-keyword">new</b> NameFinderME(model);
-			</pre><p>
-			The initialization is now finished and the Name Finder can be used. The NameFinderME
-			class is not thread safe, it must only be called from one thread. To use multiple threads
-			multiple NameFinderME instances sharing the same model instance can be created.
-			The input text should be segmented into documents, sentences and tokens.
-			To perform entity detection an application calls the find method for every sentence in the
-			document. After every document clearAdaptiveData must be called to clear the adaptive data in
-			the feature generators. Not calling clearAdaptiveData can lead to a sharp drop in the detection
-			rate after a few documents.
-			The following code illustrates that:
-			</p><pre class="programlisting">
-				
-<b class="hl-keyword">for</b> (String document[][] : documents) {
-
-  <b class="hl-keyword">for</b> (String[] sentence : document) {
-    Span nameSpans[] = nameFinder.find(sentence);
-    <i class="hl-comment" style="color: silver">// do something with the names</i>
-  }
-
-  nameFinder.clearAdaptiveData()
-}
-			 </pre><p>
-			 the following snippet shows a call to find
-			 </p><pre class="programlisting">
-				
-String sentence[] = <b class="hl-keyword">new</b> String[]{
-    <b class="hl-string"><i style="color:red">"Pierre"</i></b>,
-    <b class="hl-string"><i style="color:red">"Vinken"</i></b>,
-    <b class="hl-string"><i style="color:red">"is"</i></b>,
-    <b class="hl-string"><i style="color:red">"61"</i></b>,
-    <b class="hl-string"><i style="color:red">"years"</i></b>
-    <b class="hl-string"><i style="color:red">"old"</i></b>,
-    <b class="hl-string"><i style="color:red">"."</i></b>
-    };
-
-Span nameSpans[] = nameFinder.find(sentence);
-			</pre><p>
-			The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken. 
-			The elements between the begin and end offsets are the name tokens. In this case the begin 
-			offset is 0 and the end offset is 2. The Span object also knows the type of the entity.
-			In this case it is person (defined by the model). It can be retrieved with a call to Span.getType().
-			Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular
-			expression name finder implementation.
-		</p>
-		<p>
-			TODO: Explain how to retrieve probs from the name finder for names and for non recognized names
-		</p>
-	</div>
-	</div>
-	<div class="section" title="Name Finder Training"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tools.namefind.training"></a>Name Finder Training</h2></div></div></div>
-		
-		<p>
-			The pre-trained models might not be available for a desired language, can not detect
-			important entities or the performance is not good enough outside the news domain.
-			These are the typical reason to do custom training of the name finder on a new corpus
-			or on a corpus which is extended by private training data taken from the data which should be analyzed.
-		</p>
-		
-		<div class="section" title="Training Tool"><div class="titlepage"><div><div><h3 class="title"><a name="tools.namefind.training.tool"></a>Training Tool</h3></div></div></div>
-		
-		<p>
-			OpenNLP has a command line tool which is used to train the models available from the model
-			download page on various corpora.
-		</p>
-		<p>
-			The data can be converted to the OpenNLP name finder training format. Which is one
-            sentence per line. Some other formats are available as well.
-			The sentence must be tokenized and contain spans which mark the entities. Documents are separated by
-			empty lines which trigger the reset of the adaptive feature generators. A training file can contain
-			multiple types. If the training file contains multiple types the created model will also be able to
-			detect these multiple types.
-		</p>
-		<p>
-			Sample sentence of the data:
-			</p><pre class="screen">
-				
-&lt;START:person&gt; Pierre Vinken &lt;END&gt; , 61 years old , will join the board as a nonexecutive director Nov. 29 .
-Mr . &lt;START:person&gt; Vinken &lt;END&gt; is chairman of Elsevier N.V. , the Dutch publishing group .
-			 </pre><p>
-			 The training data should contain at least 15000 sentences to create a model which performs well.
-			 Usage of the tool:
-			</p><pre class="screen">
-				
-$ opennlp TokenNameFinderTrainer
-Usage: opennlp TokenNameFinderTrainer[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] \
-[-featuregen featuregenFile] [-nameTypes types] [-sequenceCodec codec] [-factory factoryName] \
-[-resources resourcesDir] [-type modelType] [-params paramsFile] -lang language \
--model modelFile -data sampleData [-encoding charsetName]
-
-Arguments description:
-        -featuregen featuregenFile
-                The feature generator descriptor file
-        -nameTypes types
-                name types to use for training
-        -sequenceCodec codec
-                sequence codec used to code name spans
-        -factory factoryName
-                A sub-class of TokenNameFinderFactory
-        -resources resourcesDir
-                The resources directory
-        -type modelType
-                The type of the token name finder model
-        -params paramsFile
-                training parameters file.
-        -lang language
-                language which is being processed.
-        -model modelFile
-                output model file.
-        -data sampleData
-                data to be used, usually a file name.
-        -encoding charsetName
-                encoding for reading and writing text, if absent the system default is used.
-			 </pre><p>
-			 It is now assumed that the english person name finder model should be trained from a file
-			 called en-ner-person.train which is encoded as UTF-8. The following command will train
-			 the name finder and write the model to en-ner-person.bin:
-			 </p><pre class="screen">
-				
-$ opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data en-ner-person.train -encoding UTF-8
-			 </pre><p>
-The example above will train models with a pre-defined feature set. It is also possible to use the -resources parameter to generate features based on external knowledge such as those based on word representation (clustering) features. The external resources must all be placed in a resource directory which is then passed as a parameter. If this option is used it is then required to pass, via the -featuregen parameter, a XML custom feature generator which includes some of the clustering features shipped with the TokenNameFinder. Currently three formats of clustering lexicons are accepted:
-			</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-					<p>Space separated two column file specifying the token and the cluster class as generated by toolkits such as <a class="ulink" href="https://code.google.com/p/word2vec/" target="_top">word2vec</a>.</p>
-				</li><li class="listitem">
-					<p>Space separated three column file specifying the token, clustering class and weight as such as <a class="ulink" href="https://github.com/ninjin/clark_pos_induction" target="_top">Clark's clusters</a>.</p>
-				</li><li class="listitem">
-					<p>Tab separated three column Brown clusters as generated by <a class="ulink" href="https://github.com/percyliang/brown-cluster" target="_top">
-						Liang's toolkit</a>.</p>
-				</li></ul></div><p>
-			 Additionally it is possible to specify the number of iterations,
-			 the cutoff and to overwrite all types in the training data with a single type. Finally, the -sequenceCodec parameter allows to specify a BIO (Begin, Inside, Out) or BILOU (Begin, Inside, Last, Out, Unit) encoding to represent the Named Entities. An example of one such command would be as follows:
-			 </p><pre class="screen">
-			   
-$ opennlp TokenNameFinderTrainer -featuregen brown.xml -sequenceCodec BILOU -resources clusters/ \
--params PerceptronTrainerParams.txt -lang en -model ner-test.bin -data en-train.opennlp -encoding UTF-8
-			 </pre><p>
-		</p>
-		</div>
-		<div class="section" title="Training API"><div class="titlepage"><div><div><h3 class="title"><a name="tools.namefind.training.api"></a>Training API</h3></div></div></div>
-		
-		<p>
-			To train the name finder from within an application it is recommended to use the training
-			API instead of the command line tool.
-			Basically three steps are necessary to train it:
-			</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
-					<p>The application must open a sample data stream</p>
-				</li><li class="listitem">
-					<p>Call the NameFinderME.train method</p>
-				</li><li class="listitem">
-					<p>Save the TokenNameFinderModel to a file or database</p>
-				</li></ul></div><p>
-			The three steps are illustrated by the following sample code:
-			</p><pre class="programlisting">
-				
-Charset charset = Charset.forName(<b class="hl-string"><i style="color:red">"UTF-8"</i></b>);
-ObjectStream&lt;String&gt; lineStream =
-		<b class="hl-keyword">new</b> PlainTextByLineStream(<b class="hl-keyword">new</b> FileInputStream(<b class="hl-string"><i style="color:red">"en-ner-person.train"</i></b>), charset);
-ObjectStream&lt;NameSample&gt; sampleStream = <b class="hl-keyword">new</b> NameSampleDataStream(lineStream);
-
-TokenNameFinderModel model;
-
-<b class="hl-keyword">try</b> {
-  model = NameFinderME.train(<b class="hl-string"><i style="color:red">"en"</i></b>, <b class="hl-string"><i style="color:red">"person"</i></b>, sampleStream, TrainingParameters.defaultParams(),
-            TokenNameFinderFactory nameFinderFactory);
-}
-<b class="hl-keyword">finally</b> {
-  sampleStream.close();
-}
-
-<b class="hl-keyword">try</b> {
-  modelOut = <b class="hl-keyword">new</b> BufferedOutputStream(<b class="hl-keyword">new</b> FileOutputStream(modelFile));
-  model.serialize(modelOut);
-} <b class="hl-keyword">finally</b> {
-  <b class="hl-keyword">if</b> (modelOut != null) 
-     modelOut.close();      
-}
-			 </pre><p>
-		</p>
-		</div>
-		
-		<div class="section" title="Custom Feature Generation"><div class="titlepage"><div><div><h3 class="title"><a name="tools.namefind.training.featuregen"></a>Custom Feature Generation</h3></div></div></div>
-		
-			<p>
-				OpenNLP defines a default feature generation which is used when no custom feature
-				generation is specified. Users which want to experiment with the feature generation
-				can provide a custom feature generator. Either via API or via an xml descriptor file.
-			</p>
-			<div class="section" title="Feature Generation defined by API"><div class="titlepage"><div><div><h4 class="title"><a name="tools.namefind.training.featuregen.api"></a>Feature Generation defined by API</h4></div></div></div>
-			
-			<p>
-				The custom generator must be used for training
-				and for detecting the names. If the feature generation during training time and detection
-				time is different the name finder might not be able to detect names.
-				The following lines show how to construct a custom feature generator
-				</p><pre class="programlisting">
-					
-AdaptiveFeatureGenerator featureGenerator = <b class="hl-keyword">new</b> CachedFeatureGenerator(
-         <b class="hl-keyword">new</b> AdaptiveFeatureGenerator[]{
-           <b class="hl-keyword">new</b> WindowFeatureGenerator(<b class="hl-keyword">new</b> TokenFeatureGenerator(), <span class="hl-number">2</span>, <span class="hl-number">2</span>),
-           <b class="hl-keyword">new</b> WindowFeatureGenerator(<b class="hl-keyword">new</b> TokenClassFeatureGenerator(true), <span class="hl-number">2</span>, <span class="hl-number">2</span>),
-           <b class="hl-keyword">new</b> OutcomePriorFeatureGenerator(),
-           <b class="hl-keyword">new</b> PreviousMapFeatureGenerator(),
-           <b class="hl-keyword">new</b> BigramNameFeatureGenerator(),
-           <b class="hl-keyword">new</b> SentenceFeatureGenerator(true, false),
-           <b class="hl-keyword">new</b> BrownTokenFeatureGenerator(BrownCluster dictResource)
-           });
-				</pre><p>
-				which is similar to the default feature generator but with a BrownTokenFeature added.
-				The javadoc of the feature generator classes explain what the individual feature generators do.
-				To write a custom feature generator please implement the AdaptiveFeatureGenerator interface or
-				if it must not be adaptive extend the FeatureGeneratorAdapter.
-				The train method which should be used is defined as
-				</p><pre class="programlisting">
-					
-<b class="hl-keyword">public</b> <b class="hl-keyword">static</b> TokenNameFinderModel train(String languageCode, String type,
-          ObjectStream&lt;NameSample&gt; samples, TrainingParameters trainParams,
-          TokenNameFinderFactory factory) <b class="hl-keyword">throws</b> IOException
-				</pre><p>
-				where the TokenNameFinderFactory allows to specify a custom feature generator.
-				To detect names the model which was returned from the train method must be passed to the NameFinderME constructor.
-				</p><pre class="programlisting">
-					
-<b class="hl-keyword">new</b> NameFinderME(model);
-				 </pre><p>	 
-			</p>
-			</div>
-			<div class="section" title="Feature Generation defined by XML Descriptor"><div class="titlepage"><div><div><h4 class="title"><a name="tools.namefind.training.featuregen.xml"></a>Feature Generation defined by XML Descriptor</h4></div></div></div>
-			
-			<p>
-			OpenNLP can also use a xml descriptor file to configure the feature generation. The
-            descriptor
-			file is stored inside the model after training and the feature generators are configured
-			correctly when the name finder is instantiated.
-			
-			The following sample shows a xml descriptor which contains the default feature generator plus several types of clustering features:
-				</p><pre class="programlisting">
-					
-<b class="hl-tag" style="color: #000096">&lt;generators&gt;</b>
-  <b class="hl-tag" style="color: #000096">&lt;cache&gt;</b> 
-    <b class="hl-tag" style="color: #000096">&lt;generators&gt;</b>
-      <b class="hl-tag" style="color: #000096">&lt;window</b> <span class="hl-attribute" style="color: #F5844C">prevLength</span> = <span class="hl-value" style="color: #993300">"2"</span> <span class="hl-attribute" style="color: #F5844C">nextLength</span> = <span class="hl-value" style="color: #993300">"2"</span><b class="hl-tag" style="color: #000096">&gt;</b>          
-        <b class="hl-tag" style="color: #000096">&lt;tokenclass/&gt;</b>
-      <b class="hl-tag" style="color: #000096">&lt;/window&gt;</b>
-      <b class="hl-tag" style="color: #000096">&lt;window</b> <span class="hl-attribute" style="color: #F5844C">prevLength</span> = <span class="hl-value" style="color: #993300">"2"</span> <span class="hl-attribute" style="color: #F5844C">nextLength</span> = <span class="hl-value" style="color: #993300">"2"</span><b class="hl-tag" style="color: #000096">&gt;</b>              

<TRUNCATED>

[3/3] opennlp-site git commit: OPENNLP-1069: Add missing docs and automate the inclusion process

Posted by co...@apache.org.

OPENNLP-1069: Add missing docs and automate the inclusion process

Now the build downloads the distributables and extract the docs from it.
Included a legacy page.

closes apache/opennlp-site#15


Project: http://git-wip-us.apache.org/repos/asf/opennlp-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/opennlp-site/commit/08c3208c
Tree: http://git-wip-us.apache.org/repos/asf/opennlp-site/tree/08c3208c
Diff: http://git-wip-us.apache.org/repos/asf/opennlp-site/diff/08c3208c

Branch: refs/heads/master
Commit: 08c3208cde58bcdd1ac1838231320ff67df51972
Parents: d74013d
Author: William D C M SILVA <co...@apache.org>
Authored: Sat May 20 07:48:01 2017 -0300
Committer: William D C M SILVA <co...@apache.org>
Committed: Sat May 20 07:48:01 2017 -0300

----------------------------------------------------------------------
 pom.xml                                         |  158 +-
 src/main/docs/1.7.2/manual/css/opennlp-docs.css |   72 -
 src/main/docs/1.7.2/manual/images/brat.png      |  Bin 588646 -> 0 bytes
 src/main/docs/1.7.2/manual/opennlp.html         | 5388 ------------------
 src/main/jbake/content/docs/index.ad            |    2 +
 src/main/jbake/content/docs/legacy.ad           |   64 +
 6 files changed, 155 insertions(+), 5529 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index e8c2610..a665c05 100644
--- a/pom.xml
+++ b/pom.xml
@@ -71,7 +71,7 @@
         <executions>
           <execution>
             <id>default-generate</id>
-            <phase>generate-resources</phase>
+            <phase>compile</phase>
             <goals>
               <goal>generate</goal>
             </goals>
@@ -85,40 +85,76 @@
         <version>3.0.2</version>
         <executions>
           <execution>
-            <id>copy-docs</id>
+            <id>copy-code-formatter</id>
             <!-- here the phase you need -->
             <phase>validate</phase>
             <goals>
               <goal>copy-resources</goal>
             </goals>
             <configuration>
-              <outputDirectory>${basedir}/target/opennlp-site/docs</outputDirectory>
+              <outputDirectory>${basedir}/target/opennlp-site/code-formatter</outputDirectory>
               <resources>
                 <resource>
-                  <directory>src/main/docs</directory>
+                  <directory>src/main/code-formatter</directory>
                   <filtering>false</filtering>
                 </resource>
               </resources>
             </configuration>
           </execution>
+        </executions>
+      </plugin>
+
+      <plugin>
+        <artifactId>maven-antrun-plugin</artifactId>
+        <version>1.7</version>
+        <executions>
           <execution>
-            <id>copy-code-formatter</id>
-            <!-- here the phase you need -->
-            <phase>validate</phase>
-            <goals>
-              <goal>copy-resources</goal>
-            </goals>
+            <phase>process-resources</phase>
             <configuration>
-              <outputDirectory>${basedir}/target/opennlp-site/code-formatter</outputDirectory>
-              <resources>
-                <resource>
-                  <directory>src/main/code-formatter</directory>
-                  <filtering>false</filtering>
-                </resource>
-              </resources>
+              <target>
+                <ac:for param="folder" xmlns:ac="antlib:net.sf.antcontrib">
+                  <dirset dir="target/distr/">
+                    <include name="*"/>
+                  </dirset>
+                  <sequential>
+                    <echo>Copy @{folder} docs</echo>
+                    
+                    <copy todir="target/opennlp-site/docs">
+                        <fileset dir="@{folder}" casesensitive="yes">
+                            <include name="**/docs/**/*"/>
+                            <exclude name="**/opennlp-uima-descriptors/**"/>
+                        </fileset>
+                        <mapper type="regexp" from="^.*apache-opennlp-(.*?)/docs/(.*)" to="\1/\2" />
+                    </copy>
+
+                  </sequential>
+                </ac:for>
+ 
+              </target>
             </configuration>
+            <goals>
+              <goal>run</goal>
+            </goals>
           </execution>
         </executions>
+        <dependencies>
+          <dependency>
+            <groupId>ant-contrib</groupId>
+            <artifactId>ant-contrib</artifactId>
+            <version>1.0b3</version>
+            <exclusions>
+              <exclusion>
+                <groupId>ant</groupId>
+                <artifactId>ant</artifactId>
+              </exclusion>
+            </exclusions>
+          </dependency>
+          <dependency>
+            <groupId>org.apache.ant</groupId>
+            <artifactId>ant-nodeps</artifactId>
+            <version>1.8.1</version>
+          </dependency>
+        </dependencies>
       </plugin>
 
       <plugin>
@@ -128,89 +164,73 @@
         <executions>
           <execution>
             <id>unpack</id>
-            <phase>package</phase>
+            <phase>generate-resources</phase>
             <goals>
               <goal>unpack</goal>
             </goals>
             <configuration>
               <artifactItems>
-                <!-- Start of 1.7.2 -->
-                <artifactItem>
-                  <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-tools</artifactId>
-                  <version>1.7.2</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
-                  <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.7.2/apidocs/opennlp-tools</outputDirectory>
-                </artifactItem>
-                <artifactItem>
-                  <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-brat-annotator</artifactId>
-                  <version>1.7.2</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
-                  <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.7.2/apidocs/opennlp-brat-annotator</outputDirectory>
-                </artifactItem>
+                
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-morfologik-addon</artifactId>
-                  <version>1.7.2</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
+                  <artifactId>opennlp-distr</artifactId>
+                  <version>1.5.3</version>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.7.2/apidocs/opennlp-morfologik-addon</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.5.3</outputDirectory>
                 </artifactItem>
+
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-uima</artifactId>
-                  <version>1.7.2</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
+                  <artifactId>opennlp-distr</artifactId>
+                  <version>1.6.0</version>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.7.2/apidocs/opennlp-uima</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.6.0</outputDirectory>
                 </artifactItem>
-                <!-- End of 1.7.2 -->
 
-                <!-- Start of 1.8.0 -->
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-tools</artifactId>
-                  <version>1.8.0</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
+                  <artifactId>opennlp-distr</artifactId>
+                  <version>1.7.0</version>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.8.0/apidocs/opennlp-tools</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.7.0</outputDirectory>
                 </artifactItem>
+
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-brat-annotator</artifactId>
-                  <version>1.8.0</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
+                  <artifactId>opennlp-distr</artifactId>
+                  <version>1.7.1</version>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.8.0/apidocs/opennlp-brat-annotator</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.7.1</outputDirectory>
                 </artifactItem>
+
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-morfologik-addon</artifactId>
-                  <version>1.8.0</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
+                  <artifactId>opennlp-distr</artifactId>
+                  <version>1.7.2</version>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.8.0/apidocs/opennlp-morfologik-addon</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.7.2</outputDirectory>
                 </artifactItem>
+
                 <artifactItem>
                   <groupId>org.apache.opennlp</groupId>
-                  <artifactId>opennlp-uima</artifactId>
+                  <artifactId>opennlp-distr</artifactId>
                   <version>1.8.0</version>
-                  <type>jar</type>
-                  <classifier>javadoc</classifier>
                   <overWrite>false</overWrite>
-                  <outputDirectory>${project.build.directory}/opennlp-site/docs/1.8.0/apidocs/opennlp-uima</outputDirectory>
+                  <type>zip</type>
+                  <classifier>bin</classifier>
+                  <outputDirectory>${project.build.directory}/distr/1.8.0</outputDirectory>
                 </artifactItem>
-                <!-- End of 1.8.0 -->
+
               </artifactItems>
             </configuration>
           </execution>

http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/src/main/docs/1.7.2/manual/css/opennlp-docs.css
----------------------------------------------------------------------
diff --git a/src/main/docs/1.7.2/manual/css/opennlp-docs.css b/src/main/docs/1.7.2/manual/css/opennlp-docs.css
deleted file mode 100644
index a026686..0000000
--- a/src/main/docs/1.7.2/manual/css/opennlp-docs.css
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-body {
- margin-top: 1em;
- margin-bottom: 1em;
- margin-left: 16%;
- margin-right: 8%
-}
-
-h1, h2, h3, h4, div.toc {
- color: #006699;
-}
-
-div.legalnotice {
- max-width: 450px;
-}
-
-pre.programlisting, pre.screen, pre.literallayout {
-  border: 1px dashed #006699;
-  background-color: #EEE;
-}
-
-/* 
- * Java syntax highlighting with eclipse default colors
- * and default font-style
- */
-pre.programlisting .hl-keyword {
-  color: #7F0055;
-  font-weight: bold; 
-}
-
-/* Seems to be broken, override red inline style of hl-string */
-pre.programlisting .hl-string, pre.programlisting b.hl-string i[style]{
-  color: #2A00FF !important;
-}
-
-pre.programlisting .hl-tag {
-  color: #3F7F7F;
-}
-
-pre.programlisting .hl-comment {
-  color: #3F5F5F;
-  font-style: italic;
-}
-
-pre.programlisting .hl-multiline-comment {
-  color: #3F5FBF;
-  font-style: italic;
-}
-
-pre.programlisting .hl-value {
-  color: #2A00FF;
-}
-
-pre.programlisting .hl-attribute {
-  color: #7F007F;
-}

http://git-wip-us.apache.org/repos/asf/opennlp-site/blob/08c3208c/src/main/docs/1.7.2/manual/images/brat.png
----------------------------------------------------------------------
diff --git a/src/main/docs/1.7.2/manual/images/brat.png b/src/main/docs/1.7.2/manual/images/brat.png
deleted file mode 100644
index 2afba39..0000000
Binary files a/src/main/docs/1.7.2/manual/images/brat.png and /dev/null differ