You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jz...@apache.org on 2023/01/03 14:50:25 UTC

[opennlp] branch main updated: OPENNLP-1435 Clear typos from opennlp-docs module (#480)

This is an automated email from the ASF dual-hosted git repository.

jzemerick pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/opennlp.git


The following commit(s) were added to refs/heads/main by this push:
     new c671d607 OPENNLP-1435 Clear typos from opennlp-docs module (#480)
c671d607 is described below

commit c671d6075b531e0bdbff272187523c2c9eade75d
Author: Martin Wiesner <ma...@users.noreply.github.com>
AuthorDate: Tue Jan 3 15:50:21 2023 +0100

    OPENNLP-1435 Clear typos from opennlp-docs module (#480)
    
    - fixes typos in several dockbx files
    - switches references to http URLs, if available, to a secure form (https)
    - exchanges a non-reachable URL against a web-archived version of the original
---
 opennlp-docs/src/docbkx/chunker.xml          |  2 +-
 opennlp-docs/src/docbkx/cli.xml              | 20 ++++++++---------
 opennlp-docs/src/docbkx/corpora.xml          | 32 ++++++++++++++--------------
 opennlp-docs/src/docbkx/introduction.xml     |  6 +++---
 opennlp-docs/src/docbkx/langdetect.xml       | 10 ++++-----
 opennlp-docs/src/docbkx/lemmatizer.xml       |  8 +++----
 opennlp-docs/src/docbkx/machine-learning.xml |  4 ++--
 opennlp-docs/src/docbkx/morfologik-addon.xml | 12 +++++------
 opennlp-docs/src/docbkx/namefinder.xml       | 14 ++++++------
 opennlp-docs/src/docbkx/parser.xml           |  6 +++---
 opennlp-docs/src/docbkx/postagger.xml        |  8 +++----
 opennlp-docs/src/docbkx/sentdetect.xml       |  6 +++---
 opennlp-docs/src/docbkx/tokenizer.xml        | 10 ++++-----
 opennlp-docs/src/docbkx/uima-integration.xml | 28 ++++++++++++------------
 14 files changed, 83 insertions(+), 83 deletions(-)

diff --git a/opennlp-docs/src/docbkx/chunker.xml b/opennlp-docs/src/docbkx/chunker.xml
index 262f4734..5c65deac 100644
--- a/opennlp-docs/src/docbkx/chunker.xml
+++ b/opennlp-docs/src/docbkx/chunker.xml
@@ -150,7 +150,7 @@ Sequence topSequences[] = chunk.topKSequences(sent, pos);]]>
 		</para>
 		<para>
 		The training data can be converted to the OpenNLP chunker training format,
-		which is based on <ulink url="http://www.cnts.ua.ac.be/conll2000/chunking">CoNLL2000</ulink>.
+		which is based on <ulink url="https://www.cnts.ua.ac.be/conll2000/chunking">CoNLL2000</ulink>.
         Other formats may also be available.
 		The training data consist of three columns separated one single space. Each word has been put on a
 		separate line and there is an empty line after each sentence. The first column contains
diff --git a/opennlp-docs/src/docbkx/cli.xml b/opennlp-docs/src/docbkx/cli.xml
index f809029a..adc6538d 100644
--- a/opennlp-docs/src/docbkx/cli.xml
+++ b/opennlp-docs/src/docbkx/cli.xml
@@ -32,7 +32,7 @@ under the License.
 
 <title>The Command Line Interface</title>
 
-<para>This section details the available tools and parameters of the Command Line Interface. For a introduction in its usage please refer to <xref linkend='intro.cli'/>.  </para>
+<para>This section details the available tools and parameters of the Command Line Interface. For an introduction in its usage please refer to <xref linkend='intro.cli'/>.  </para>
 
 <section id='tools.cli.doccat'>
 
@@ -92,7 +92,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>encoding</entry>
@@ -139,7 +139,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>encoding</entry>
@@ -197,7 +197,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>encoding</entry>
@@ -232,7 +232,7 @@ Usage: opennlp DoccatConverter help|leipzig [help|options...]
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>encoding</entry>
@@ -299,7 +299,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>sentencesPerSample</entry>
@@ -346,7 +346,7 @@ Usage: opennlp LanguageDetectorConverter help|leipzig [help|options...]
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>sentencesPerSample</entry>
@@ -410,7 +410,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>sentencesPerSample</entry>
@@ -469,7 +469,7 @@ Arguments description:
 <entry>sentencesDir</entry>
 <entry>sentencesDir</entry>
 <entry>No</entry>
-<entry>Dir with Leipig sentences to be used</entry>
+<entry>Dir with Leipzig sentences to be used</entry>
 </row>
 <row>
 <entry>sentencesPerSample</entry>
@@ -3919,7 +3919,7 @@ Usage: opennlp ChunkerConverter help|ad [help|options...]
 Usage: opennlp Parser [-bs n -ap n -k n -tk tok_model] model < sentences 
 -bs n: Use a beam size of n.
 -ap f: Advance outcomes in with at least f% of the probability mass.
--k n: Show the top n parses.  This will also display their log-probablities.
+-k n: Show the top n parses.  This will also display their log-probabilities.
 -tk tok_model: Use the specified tokenizer model to tokenize the sentences. Defaults to a WhitespaceTokenizer.
 
 ]]>
diff --git a/opennlp-docs/src/docbkx/corpora.xml b/opennlp-docs/src/docbkx/corpora.xml
index 187c9c31..b21f61a6 100644
--- a/opennlp-docs/src/docbkx/corpora.xml
+++ b/opennlp-docs/src/docbkx/corpora.xml
@@ -144,13 +144,13 @@ F-Measure: 0.9230575441395671]]>
 		<title>Getting the data</title>
 		<para>The data consists of three files per language: one training file and two test files testa and testb.
 		The first test file will be used in the development phase for finding good parameters for the learning system.
-		The second test file will be used for the final evaluation. Currently there are data files available for two languages:
+		The second test file will be used for the final evaluation. Currently, there are data files available for two languages:
 		Spanish and Dutch.
 		</para>
 		<para>
 		The Spanish data is a collection of news wire articles made available by the Spanish EFE News Agency. The articles are
-		from May 2000. The annotation was carried out by the <ulink url="http://www.talp.cat/">TALP Research Center</ulink> of the Technical University of Catalonia (UPC)
-		and the <ulink url="http://clic.ub.edu/">Center of Language and Computation (CLiC)</ulink>of the University of Barcelona (UB), and funded by the European Commission
+		from May 2000. The annotation was carried out by the <ulink url="https://www.talp.cat/">TALP Research Center</ulink> of the Technical University of Catalonia (UPC)
+		and the <ulink url="https://web.archive.org/web/20220516042208/http://clic.ub.edu/">Center of Language and Computation (CLiC)</ulink>of the University of Barcelona (UB), and funded by the European Commission
 		through the NAMIC project (IST-1999-12392). 
 		</para>
 		<para>
@@ -159,12 +159,12 @@ F-Measure: 0.9230575441395671]]>
 		</para>
 		<para>
 		You can find the Spanish files here: 
-		<ulink url="http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html">http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html</ulink>
+		<ulink url="https://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html">https://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html</ulink>
 		You must download esp.train.gz, unzip it and you will see the file esp.train.
 		</para>
 		<para>
 		You can find the Dutch files here: 
-		<ulink url="http://www.cnts.ua.ac.be/conll2002/ner.tgz">http://www.cnts.ua.ac.be/conll2002/ner.tgz</ulink>
+		<ulink url="https://www.cnts.ua.ac.be/conll2002/ner.tgz">https://www.cnts.ua.ac.be/conll2002/ner.tgz</ulink>
 		You must unzip it and go to /ner/data/ned.train.gz, so you unzip it too, and you will see the file ned.train.
 		</para>
 		</section>
@@ -260,7 +260,7 @@ path: .\es_ner_person.bin]]>
 		<para>
 		The English data is the Reuters Corpus, which is a collection of news wire articles.
 		The Reuters Corpus can be obtained free of charges from the NIST for research
-		purposes: <ulink url="http://trec.nist.gov/data/reuters/reuters.html">http://trec.nist.gov/data/reuters/reuters.html</ulink>
+		purposes: <ulink url="https://trec.nist.gov/data/reuters/reuters.html">https://trec.nist.gov/data/reuters/reuters.html</ulink>
 		</para>
 		<para>
 		The German data is a collection of articles from the German newspaper Frankfurter
@@ -387,16 +387,16 @@ F-Measure: 0.8267557582133971]]>
 	<section id="tools.corpora.arvores-deitadas">
 		<title>Arvores Deitadas</title>
 		<para>
-		The Portuguese corpora available at <ulink url="http://www.linguateca.pt">Floresta Sintá(c)tica</ulink> project follow the Arvores Deitadas (AD) format. Apache OpenNLP includes tools to convert from AD format to native format.  
+		The Portuguese corpora available at <ulink url="https://www.linguateca.pt">Floresta Sintá(c)tica</ulink> project follow the Arvores Deitadas (AD) format. Apache OpenNLP includes tools to convert from AD format to native format.
 		</para>		
 		<section id="tools.corpora.arvores-deitadas.getting">
 			<title>Getting the data</title>
 			<para>
-			The Corpus can be downloaded from here: <ulink url="http://www.linguateca.pt/floresta/corpus.html">http://www.linguateca.pt/floresta/corpus.html</ulink>
+			The Corpus can be downloaded from here: <ulink url="https://www.linguateca.pt/floresta/corpus.html">https://www.linguateca.pt/floresta/corpus.html</ulink>
 			</para>
 			<para>
-			The Name Finder models were trained using the Amazonia corpus: <ulink url="http://www.linguateca.pt/floresta/ficheiros/gz/amazonia.ad.gz">amazonia.ad</ulink>.
-			The Chunker models were trained using the <ulink url="http://www.linguateca.pt/floresta/ficheiros/gz/Bosque_CF_8.0.ad.txt.gz">Bosque_CF_8.0.ad</ulink>.
+			The Name Finder models were trained using the Amazonia corpus: <ulink url="https://www.linguateca.pt/floresta/ficheiros/gz/amazonia.ad.gz">amazonia.ad</ulink>.
+			The Chunker models were trained using the <ulink url="https://www.linguateca.pt/floresta/ficheiros/gz/Bosque_CF_8.0.ad.txt.gz">Bosque_CF_8.0.ad</ulink>.
 			</para>
 		</section>
 		
@@ -474,15 +474,15 @@ F-Measure: 0.7717879983140168]]>
 		Penn Treebank for syntax and the Penn PropBank for predicate-argument
 		structure. Its semantic representation will include word sense
 		disambiguation for nouns and verbs, with each word sense connected to
-		an ontology, and coreference. The current goals call for annotation of
+		an ontology, and co-reference. The current goals call for annotation of
 		over a million words each of English and Chinese, and half a million
-		words of Arabic over five years." (http://catalog.ldc.upenn.edu/LDC2011T03)
+		words of Arabic over five years." (https://catalog.ldc.upenn.edu/LDC2011T03)
 	</para>
 		<section id="tools.corpora.ontonotes.namefinder">
 		<title>Name Finder Training</title>
 	<para>
 		The OntoNotes corpus can be used to train the Name Finder. The corpus
-		contains many different name types
+		contains different name types
 		to train a model for a specific type only the built-in type filter
 		option should be used.
 	</para>
@@ -535,11 +535,11 @@ path: /dev/opennlp/trunk/opennlp-tools/en-ontonotes.bin]]>
 			OpenNLP can directly be trained and evaluated on labeled data in the brat format.
 			Instructions on how to use, download and install brat can be found on the project website:
 
-			<ulink url="http://brat.nlplab.org">http://brat.nlplab.org</ulink>
+			<ulink url="https://brat.nlplab.org">https://brat.nlplab.org</ulink>
 
 			Configuration of brat, including setting up the different entities and relations can be found at:
 
-			<ulink url="http://brat.nlplab.org/configuration.html">http://brat.nlplab.org/configuration.html</ulink>
+			<ulink url="https://brat.nlplab.org/configuration.html">https://brat.nlplab.org/configuration.html</ulink>
 
 		</para>
 
@@ -548,7 +548,7 @@ path: /dev/opennlp/trunk/opennlp-tools/en-ontonotes.bin]]>
 			<title>Sentences and Tokens</title>
 			<para>
 				The brat annotation tool only adds named entity spans to the data and doesn't provide information
-				about tokens and sentences. To train the name finder this information is required. By default it
+				about tokens and sentences. To train the name finder this information is required. By default, it
 				is assumed that each line is a sentence and that tokens are whitespace separated. This can be
 				adjusted by providing a custom sentence detector and optional also a tokenizer.
 
diff --git a/opennlp-docs/src/docbkx/introduction.xml b/opennlp-docs/src/docbkx/introduction.xml
index 484e5b08..16acbbeb 100644
--- a/opennlp-docs/src/docbkx/introduction.xml
+++ b/opennlp-docs/src/docbkx/introduction.xml
@@ -34,7 +34,7 @@ under the License.
         </para>
 
         <para>
-        The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks.
+        The goal of the OpenNLP project will be to create a mature toolkit for the aforementioned tasks.
         An additional goal is to provide a large number of pre-built models for a variety of languages, as
         well as the annotated text resources that those models are derived from.
         </para>
@@ -306,7 +306,7 @@ $ opennlp ToolNameEvaluator -model en-model-name.bin -lang en -data input.test -
                 documentation we will refer to these models as "OpenNLP models." All NLP
                 components of OpenNLP support this type of model. The sections below in
                 this documentation describe how to train and use these models. <ulink url="https://opennlp.apache.org/models.html">Pre-trained
-                models</ulink> are available for some languages and some of the OpenNLP components.
+                models</ulink> are available for some languages and some OpenNLP components.
             </para>
         </section>
         <section id="intro.models.onnx">
@@ -318,7 +318,7 @@ $ opennlp ToolNameEvaluator -model en-model-name.bin -lang en -data input.test -
                 each of the OpenNLP components that supports ONNX models describes how to
                 use ONNX models for inference. Note that OpenNLP does not support training
                 models that can be used by the ONNX Runtime - ONNX models must be created
-                outside of OpenNLP using other tools.
+                outside OpenNLP using other tools.
             </para>
         </section>
     </section>
diff --git a/opennlp-docs/src/docbkx/langdetect.xml b/opennlp-docs/src/docbkx/langdetect.xml
index aef1fd41..3025d5e5 100644
--- a/opennlp-docs/src/docbkx/langdetect.xml
+++ b/opennlp-docs/src/docbkx/langdetect.xml
@@ -27,7 +27,7 @@ under the License.
 		<title>Classifying</title>
 		<para>
 		The OpenNLP Language Detector classifies a document in ISO-639-3 languages according to the model capabilities.
-		A model can be trained with Maxent, Perceptron or Naive Bayes algorithms. By default normalizes a text and
+		A model can be trained with Maxent, Perceptron or Naive Bayes algorithms. By default, normalizes a text and
 			the context generator extracts n-grams of size 1, 2 and 3. The n-gram sizes, the normalization and the
 			context generator can be customized by extending the LanguageDetectorFactory.
 
@@ -57,7 +57,7 @@ under the License.
 						</row>
 						<row>
 							<entry>TwitterCharSequenceNormalizer</entry>
-							<entry>Replaces hashtags and Twitter user names by blank spaces.</entry>
+							<entry>Replaces hashtags and Twitter usernames by blank spaces.</entry>
 						</row>
 						<row>
 							<entry>NumberCharSequenceNormalizer</entry>
@@ -160,8 +160,8 @@ $ bin/opennlp LanguageDetectorTrainer[.leipzig] -model modelFile [-params params
 		<section id="tools.langdetect.training.leipzig">
 			<title>Training with Leipzig</title>
 			<para>
-				The Leipzig Corpora collection presents corpora in different languages. The corpora is a collection
-				of individual sentences collected from the web and newspapers. The Corpora is available as plain text
+				The Leipzig Corpora collection presents corpora in different languages. The corpora are a collection
+				of individual sentences collected from the web and newspapers. The Corpora are available as plain text
 				and as MySQL database tables. The OpenNLP integration can only use the plain text version.
 				The	individual plain text packages can be downloaded here:
 				<ulink url="https://wortschatz.uni-leipzig.de/en/download">https://wortschatz.uni-leipzig.de/en/download</ulink>
@@ -184,7 +184,7 @@ $ bin/opennlp LanguageDetectorTrainer.leipzig -model modelFile [-params paramsFi
 			<para>
 				The following sequence of commands shows how to convert the Leipzig Corpora collection at folder
 				leipzig-train/ to the default Language Detector format, by creating groups of 5 sentences as documents
-				and limiting to 10000 documents per language. Them, it shuffles the result and select the first
+				and limiting to 10000 documents per language. Then, it shuffles the result and select the first
 				100000 lines as train corpus and the last 20000 as evaluation corpus:
 				<screen>
 					<![CDATA[
diff --git a/opennlp-docs/src/docbkx/lemmatizer.xml b/opennlp-docs/src/docbkx/lemmatizer.xml
index 8be62423..44356e04 100644
--- a/opennlp-docs/src/docbkx/lemmatizer.xml
+++ b/opennlp-docs/src/docbkx/lemmatizer.xml
@@ -25,7 +25,7 @@
 			postag of the word is required to find the lemma. For example, the form
 			`show' may refer
 			to either the verb "to show" or to the noun "show".
-			Currently OpenNLP implement statistical and dictionary-based lemmatizers.
+			Currently, OpenNLP implement statistical and dictionary-based lemmatizers.
 		</para>
 		<section id="tools.lemmatizer.tagging.cmdline">
 			<title>Lemmatizer Tool</title>
@@ -75,7 +75,7 @@ signed VBD sign
 			<title>Lemmatizer API</title>
 			<para>
 				The Lemmatizer can be embedded into an application via its API.
-				Currently a statistical
+				Currently, a statistical
 				and DictionaryLemmatizer are available. Note that these two methods are
 				complementary and
 				the DictionaryLemmatizer can also be used as a way of post-processing
@@ -153,7 +153,7 @@ shrapnel	NN	shrapnel
 		]]>
 		</screen>
 				Alternatively, if a (word,postag) pair can output multiple lemmas, the
-				the lemmatizer dictionary would consists of a text file containing, for 
+				the lemmatizer dictionary would consist of a text file containing, for
 				each row, a word, its postag and the corresponding lemmas separated by "#":
 				<screen>
 		<![CDATA[
@@ -267,7 +267,7 @@ Arguments description:
 		</screen>
 					Its now assumed that the english lemmatizer model should be trained
 					from a file called
-					en-lemmatizer.train which is encoded as UTF-8. The following command will train the
+					'en-lemmatizer.train' which is encoded as UTF-8. The following command will train the
 					lemmatizer and write the model to en-lemmatizer.bin:
 					<screen>
 		<![CDATA[
diff --git a/opennlp-docs/src/docbkx/machine-learning.xml b/opennlp-docs/src/docbkx/machine-learning.xml
index 2df092e2..80db8c69 100644
--- a/opennlp-docs/src/docbkx/machine-learning.xml
+++ b/opennlp-docs/src/docbkx/machine-learning.xml
@@ -31,7 +31,7 @@ under the License.
 		Maximum entropy modeling is a framework for integrating information from many heterogeneous
 		information sources for classification.  The data for a  classification problem is described
 		as a (potentially large) number of features.  These features can be quite complex and allow
-		the experimenter to make use of prior knowledge about what types of informations are expected
+		the experimenter to make use of prior knowledge about what types of information are expected
 		to be important for classification. Each feature corresponds to a constraint on the model.
 		We then compute the maximum entropy model, the model with the maximum entropy of all the models
 		that satisfy the constraints.  This term may seem perverse, since we have spent most of the book
@@ -80,7 +80,7 @@ under the License.
 		</para>
 		<para>
 		We have also set in place some interfaces and code to make it easier to automate the training
-		and evaluation process (the Evalable interface and the TrainEval class).  It is not necessary
+		and evaluation process (the Evaluable interface and the TrainEval class).  It is not necessary
 		to use this functionality, but if you do you'll find it much easier to see how well your models
 		are doing.  The opennlp.grok.preprocess.namefind package is an example of a maximum entropy
 		component which uses this functionality.
diff --git a/opennlp-docs/src/docbkx/morfologik-addon.xml b/opennlp-docs/src/docbkx/morfologik-addon.xml
index 6f188448..27c2dcd0 100644
--- a/opennlp-docs/src/docbkx/morfologik-addon.xml
+++ b/opennlp-docs/src/docbkx/morfologik-addon.xml
@@ -30,27 +30,27 @@
 			<itemizedlist mark='opencircle'>
 				<listitem>
 					<para>
-					The <code>MorfologikPOSTaggerFactory</code> extends <code>POSTaggerFactory</code>, which helps creating a POSTagger model with an embedded FSA TagDictionary.
+					The <code>MorfologikPOSTaggerFactory</code> extends <code>POSTaggerFactory</code>, which helps create a POSTagger model with an embedded FSA TagDictionary.
 					</para>
 				</listitem>
 				<listitem>
 					<para>
-					The <code>MorfologikTagDictionary</code> implements a FSA based <code>TagDictionary</code>, allowing for much smaller files than the default XML based with improved memory consumption.
+					The <code>MorfologikTagDictionary</code> implements an FSA based <code>TagDictionary</code>, allowing for much smaller files than the default XML based with improved memory consumption.
 					</para>
 				</listitem>
 				<listitem>
 					<para>
-					The <code>MorfologikLemmatizer</code> implements a FSA based <code>Lemmatizer</code> dictionaries.
+					The <code>MorfologikLemmatizer</code> implements an FSA based <code>Lemmatizer</code> dictionaries.
 					</para>
 				</listitem>
 			</itemizedlist>
 		</para>
 		<para>
-		The first two implementations can be used directly from command line, as in the example bellow. Having a FSA Morfologik dictionary (see next section how to build one), you can train a POS Tagger
+		The first two implementations can be used directly from command line, as in the example bellow. Having an FSA Morfologik dictionary (see next section how to build one), you can train a POS Tagger
 		model with an embedded FSA dictionary. 
 		</para>
 		<para>
-		The example trains a POSTagger with a CONLL corpus named <code>portuguese_bosque_train.conll</code> and a FSA dictionary named 
+		The example trains a POSTagger with a CONLL corpus named <code>portuguese_bosque_train.conll</code> and an FSA dictionary named
 		<code>pt-morfologik.dict</code>. It will output a model named <code>pos-pt_fsadic.model</code>.
 		
 		<screen>
@@ -109,7 +109,7 @@ fsa.dict.encoder=prefix
 		
 				<programlisting language="java">
 		<![CDATA[
-// Part 1: compile a FSA lemma dictionary 
+// Part 1: compile an FSA lemma dictionary
    
 // we need the tabular dictionary. It is mandatory to have info 
 //  file with same name, but .info extension
diff --git a/opennlp-docs/src/docbkx/namefinder.xml b/opennlp-docs/src/docbkx/namefinder.xml
index a2dfd0fc..92e5c6bf 100644
--- a/opennlp-docs/src/docbkx/namefinder.xml
+++ b/opennlp-docs/src/docbkx/namefinder.xml
@@ -76,7 +76,7 @@ Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publis
 		<para>
 			To use the Name Finder in a production system it is strongly recommended to embed it
 			directly into the application instead of using the command line interface.
-			First the name finder model must be loaded into memory from disk or an other source.
+			First the name finder model must be loaded into memory from disk or another source.
 			In the sample below it is loaded from disk.
 			<programlisting language="java">
 				<![CDATA[
@@ -143,7 +143,7 @@ String sentence[] = new String[]{
 Span nameSpans[] = nameFinder.find(sentence);]]>
 			</programlisting>
 			The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken. 
-			The elements between the begin and end offsets are the name tokens. In this case the begin 
+			The elements between the start and end offsets are the name tokens. In this case the start
 			offset is 0 and the end offset is 2. The Span object also knows the type of the entity.
 			In this case it is person (defined by the model). It can be retrieved with a call to Span.getType().
 			Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular
@@ -240,13 +240,13 @@ Arguments description:
                 encoding for reading and writing text, if absent the system default is used.]]>
 			 </screen>
 			 It is now assumed that the english person name finder model should be trained from a file
-			 called en-ner-person.train which is encoded as UTF-8. The following command will train
+			 called 'en-ner-person.train' which is encoded as UTF-8. The following command will train
 			 the name finder and write the model to en-ner-person.bin:
 			 <screen>
 				<![CDATA[
 $ opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data en-ner-person.train -encoding UTF-8]]>
 			 </screen>
-The example above will train models with a pre-defined feature set. It is also possible to use the -resources parameter to generate features based on external knowledge such as those based on word representation (clustering) features. The external resources must all be placed in a resource directory which is then passed as a parameter. If this option is used it is then required to pass, via the -featuregen parameter, a XML custom feature generator which includes some of the clustering fe [...]
+The example above will train models with a pre-defined feature set. It is also possible to use the -resources parameter to generate features based on external knowledge such as those based on word representation (clustering) features. The external resources must all be placed in a resource directory which is then passed as a parameter. If this option is used it is then required to pass, via the -featuregen parameter, an XML custom feature generator which includes some clustering features [...]
 			<itemizedlist>
 				<listitem>
 					<para>Space separated two column file specifying the token and the cluster class as generated by toolkits such as <ulink url="https://code.google.com/p/word2vec/">word2vec</ulink>.</para>
@@ -309,7 +309,7 @@ try (ObjectStream modelOut = new BufferedOutputStream(new FileOutputStream(model
 			<para>
 				OpenNLP defines a default feature generation which is used when no custom feature
 				generation is specified. Users which want to experiment with the feature generation
-				can provide a custom feature generator. Either via API or via an xml descriptor file.
+				can provide a custom feature generator. Either via an API or via a xml descriptor file.
 			</para>
 			<section id="tools.namefind.training.featuregen.api">
 			<title>Feature Generation defined by API</title>
@@ -476,7 +476,7 @@ new NameFinderME(model);]]>
 			      </row>
 			      <row>
 					<entry>WindowFeatureGeneratorFactory</entry>
-					<entry><emphasis>prevLength</emphasis> and <emphasis>nextLength</emphasis> must be integers ans specify the window size</entry>
+					<entry><emphasis>prevLength</emphasis> and <emphasis>nextLength</emphasis> must be integers and specify the window size</entry>
 			      </row>
 			    </tbody>
 			  </tgroup>
@@ -551,7 +551,7 @@ System.out.println(result.toString());]]>
 				<itemizedlist>
 				<listitem>
 					<para>
-						<ulink  url="http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_1.html">
+						<ulink  url="https://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_1.html">
 							MUC6
 						</ulink>
 					</para>
diff --git a/opennlp-docs/src/docbkx/parser.xml b/opennlp-docs/src/docbkx/parser.xml
index 7f947cae..64cac083 100644
--- a/opennlp-docs/src/docbkx/parser.xml
+++ b/opennlp-docs/src/docbkx/parser.xml
@@ -43,7 +43,7 @@ under the License.
 		<para>
 		The easiest way to try out the Parser is the command line tool.
 		The tool is only intended for demonstration and testing.
-		Download the English chunking parser model from the our website and start the Parse
+		Download the English chunking parser model from the website and start the Parse
  		Tool with the following command.
 				<screen>
 				<![CDATA[
@@ -140,7 +140,7 @@ Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);]]>
 		Penn Treebank annotation guidelines can be found on the
             <ulink url="https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">Penn Treebank home page</ulink>.
 		A parser model also contains a pos tagger model, depending on the amount of available
-		training data it is recommend to switch the tagger model against a tagger model which
+		training data it is recommended to switch the tagger model against a tagger model which
 		was trained on a larger corpus. The pre-trained parser model provided on the website
 		is doing this to achieve a better performance. (TODO: On which data is the model on
 		the website trained, and say on which data the tagger model is trained)
@@ -322,7 +322,7 @@ Usage: opennlp ParserEvaluator[.ontonotes|frenchtreebank] [-misclassified true|f
                -data sampleData [-encoding charsetName]]]>
 		</screen>
 				A sample of the command considering you have a data sample named
-				en-parser-chunking.eval
+				en-parser-chunking.eval,
 				and you trained a model called en-parser-chunking.bin:
 				<screen>
 				<![CDATA[
diff --git a/opennlp-docs/src/docbkx/postagger.xml b/opennlp-docs/src/docbkx/postagger.xml
index ad98178c..69eacc60 100644
--- a/opennlp-docs/src/docbkx/postagger.xml
+++ b/opennlp-docs/src/docbkx/postagger.xml
@@ -134,7 +134,7 @@ That_DT sounds_VBZ good_JJ ._.]]>
 			training material it is suggested to use an empty line.
 		</para>
 		<para>The Part-of-Speech Tagger can either be trained with a command line tool,
-		or via an training API.
+		or via a training API.
 		</para>
 		
 		<section id="tools.postagger.training.tool">
@@ -195,7 +195,7 @@ $ opennlp POSTaggerTrainer -type maxent -model en-pos-maxent.bin \
 				<para>The application must open a sample data stream</para>
 			</listitem>
 			<listitem>
-				<para>Call the POSTagger.train method</para>
+				<para>Call the 'POSTagger.train' method</para>
 			</listitem>
 			<listitem>
 				<para>Save the POSModel to a file</para>
@@ -232,10 +232,10 @@ try (OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(model
 		<para>
 		The tag dictionary is a word dictionary which specifies which tags a specific token can have. Using a tag
 		dictionary has two advantages, inappropriate tags can not been assigned to tokens in the dictionary and the
-		beam search algorithm has to consider less possibilities and can search faster.
+		beam search algorithm has to consider fewer possibilities and can search faster.
 		</para>
 		<para>
-		The dictionary is defined in an xml format and can be created and stored with the POSDictionary class.
+		The dictionary is defined in a xml format and can be created and stored with the POSDictionary class.
 		Please for now checkout the javadoc and source code of that class.
 		</para>
 		<para>Note: The format should be documented and sample code should show how to use the dictionary.
diff --git a/opennlp-docs/src/docbkx/sentdetect.xml b/opennlp-docs/src/docbkx/sentdetect.xml
index 2f2fd1e8..ee7868eb 100644
--- a/opennlp-docs/src/docbkx/sentdetect.xml
+++ b/opennlp-docs/src/docbkx/sentdetect.xml
@@ -32,7 +32,7 @@ under the License.
 		marks the end of a sentence or not. In this sense a sentence is defined 
 		as the longest white space trimmed character sequence between two punctuation
 		marks. The first and last sentence make an exception to this rule. The first 
-		non whitespace character is assumed to be the begin of a sentence, and the 
+		non whitespace character is assumed to be the start of a sentence, and the
 		last non whitespace character is assumed to be a sentence end.
 		The sample text below should be segmented into its sentences.
 		<screen>
@@ -50,7 +50,7 @@ Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
 Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC,
     was named a director of this British industrial conglomerate.]]>
 		</screen>
-		Usually Sentence Detection is done before the text is tokenized and that's the way the pre-trained models on the web site are trained,
+		Usually Sentence Detection is done before the text is tokenized and that's the way the pre-trained models on the website are trained,
 		but it is also possible to perform tokenization first and let the Sentence Detector process the already tokenized text.
 		The OpenNLP Sentence Detector cannot identify sentence boundaries based on the contents of the sentence. A prominent example is the first sentence in an article where the title is mistakenly identified to be the first part of the first sentence.
 		Most components in OpenNLP expect input which is segmented into sentences.
@@ -117,7 +117,7 @@ Span sentences[] = sentenceDetector.sentPosDetect("  First sentence. Second sent
 		OpenNLP has a command line tool which is used to train the models available from the model
 		download page on various corpora. The data must be converted to the OpenNLP Sentence Detector
 		training format. Which is one sentence per line. An empty line indicates a document boundary.
-		In case the document boundary is unknown, its recommended to have an empty line every few ten
+		In case the document boundary is unknown, it's recommended to have an empty line every few ten
 		sentences. Exactly like the output in the sample above.
 		Usage of the tool:
 		<screen>
diff --git a/opennlp-docs/src/docbkx/tokenizer.xml b/opennlp-docs/src/docbkx/tokenizer.xml
index d596d756..32d4f241 100644
--- a/opennlp-docs/src/docbkx/tokenizer.xml
+++ b/opennlp-docs/src/docbkx/tokenizer.xml
@@ -116,7 +116,7 @@ $ opennlp TokenizerME en-token.bin < article.txt > article-tokenized.txt]]>
 		</para>
 		<para>
 			Since most text comes truly raw and doesn't have sentence boundaries
-			and such, its possible to create a pipe which first performs sentence
+			and such, it's possible to create a pipe which first performs sentence
 			boundary detection and tokenization. The following sample illustrates
 			that.
 			<screen>
@@ -179,7 +179,7 @@ String tokens[] = tokenizer.tokenize("An input sample sentence.");]]>
 "An", "input", "sample", "sentence", "."]]>
 		 </programlisting>
 			The second method, tokenizePos returns an array of Spans, each Span
-			contain the begin and end character offsets of the token in the input
+			contain the start and end character offsets of the token in the input
 			String.
 			<programlisting language="java">
 			<![CDATA[
@@ -215,7 +215,7 @@ double tokenProbs[] = tokenizer.getTokenProbabilities();]]>
 				available from the model download page on various corpora. The data
 				can be converted to the OpenNLP Tokenizer training format or used directly.
                 The OpenNLP format contains one sentence per line. Tokens are either separated by a
-                whitespace or by a special &lt;SPLIT&gt; tag. Tokens are split automaticaly on whitespace
+                whitespace or by a special &lt;SPLIT&gt; tag. Tokens are split automatically on whitespace
                 and at least one &lt;SPLIT&gt; tag must be present in the training text.
 				
 				The following sample shows the sample from above in the correct format.
@@ -413,12 +413,12 @@ He said "This is a test".]]>
 InputStream dictIn = new FileInputStream("latin-detokenizer.xml");
 DetokenizationDictionary dict = new DetokenizationDictionary(dictIn);]]>
 				</programlisting>
-				After the rule dictionary is loadeed the DictionaryDetokenizer can be instantiated.
+				After the rule dictionary is loaded the DictionaryDetokenizer can be instantiated.
 				<programlisting language="java">
 					<![CDATA[
 Detokenizer detokenizer = new DictionaryDetokenizer(dict);]]>
 				</programlisting>
-				The detokenizer offers two detokenize methods,the first detokenize the input tokens into a String.
+				The detokenizer offers two detokenize methods, the first detokenize the input tokens into a String.
 				<programlisting language="java">
 					<![CDATA[
 String[] tokens = new String[]{"A", "co", "-", "worker", "helped", "."};
diff --git a/opennlp-docs/src/docbkx/uima-integration.xml b/opennlp-docs/src/docbkx/uima-integration.xml
index a8bc5279..e29162ce 100644
--- a/opennlp-docs/src/docbkx/uima-integration.xml
+++ b/opennlp-docs/src/docbkx/uima-integration.xml
@@ -21,13 +21,13 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-<chapter id="org.apche.opennlp.uima">
+<chapter id="org.apache.opennlp.uima">
 <title>UIMA Integration</title>
 <para>
 	The UIMA Integration wraps the OpenNLP components in UIMA Analysis Engines which can 
 	be used to automatically annotate text and train new OpenNLP models from annotated text.
 </para>
-	<section id="org.apche.opennlp.running-pear-sample">
+	<section id="org.apache.opennlp.running-pear-sample">
 		<title>Running the pear sample in CVD</title>
 		<para>
 			The Cas Visual Debugger is shipped as part of the UIMA distribution and is a tool which can run
@@ -55,27 +55,27 @@ createPear:
      [copy] Copying 1 file to OpenNlpTextAnalyzer/lib
      [copy] Copying 3 files to OpenNlpTextAnalyzer/lib
     [mkdir] Created dir: OpenNlpTextAnalyzer/models
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-token.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-token.bin
       [get] To: OpenNlpTextAnalyzer/models/en-token.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-sent.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-sent.bin
       [get] To: OpenNlpTextAnalyzer/models/en-sent.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-date.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-date.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-date.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-location.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-location.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-money.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-money.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-money.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-organization.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-percentage.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-percentage.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-percentage.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-person.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-person.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-ner-time.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-ner-time.bin
       [get] To: OpenNlpTextAnalyzer/models/en-ner-time.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin
       [get] To: OpenNlpTextAnalyzer/models/en-pos-maxent.bin
-      [get] Getting: http://opennlp.sourceforge.net/models-1.5/en-chunker.bin
+      [get] Getting: https://opennlp.sourceforge.net/models-1.5/en-chunker.bin
       [get] To: OpenNlpTextAnalyzer/models/en-chunker.bin
       [zip] Building zip: OpenNlpTextAnalyzer.pear
 
@@ -92,7 +92,7 @@ Total time: 3 minutes 20 seconds]]>
 			must be written in English.
 		</para>
 	</section>
-	<section id="org.apche.opennlp.further-help">
+	<section id="org.apache.opennlp.further-help">
 		<title>Further Help</title>
 		<para>
 			For more information about how to use the integration please consult the javadoc of the individual