You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by co...@apache.org on 2017/05/09 16:13:25 UTC

opennlp git commit: OPENNLP-1052: Update README and CLI docbook before release

Repository: opennlp
Updated Branches:
  refs/heads/master 3ab6698b6 -> db9c511e8


OPENNLP-1052: Update README and CLI docbook before release

closes apache/opennlp#195


Project: http://git-wip-us.apache.org/repos/asf/opennlp/repo
Commit: http://git-wip-us.apache.org/repos/asf/opennlp/commit/db9c511e
Tree: http://git-wip-us.apache.org/repos/asf/opennlp/tree/db9c511e
Diff: http://git-wip-us.apache.org/repos/asf/opennlp/diff/db9c511e

Branch: refs/heads/master
Commit: db9c511e8d5c3665eb2bb31cf0b11c0302252d45
Parents: 3ab6698
Author: William D C M SILVA <co...@apache.org>
Authored: Tue May 9 13:09:46 2017 -0300
Committer: William D C M SILVA <co...@apache.org>
Committed: Tue May 9 13:09:46 2017 -0300

----------------------------------------------------------------------
 opennlp-distr/README            |  29 +-
 opennlp-docs/src/docbkx/cli.xml | 582 +++++++++++++++++++++--------------
 2 files changed, 364 insertions(+), 247 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/opennlp/blob/db9c511e/opennlp-distr/README
----------------------------------------------------------------------
diff --git a/opennlp-distr/README b/opennlp-distr/README
index 12dc8ec..975c651 100644
--- a/opennlp-distr/README
+++ b/opennlp-distr/README
@@ -19,22 +19,25 @@ What is new in Apache OpenNLP ${pom.version}
 ---------------------------------------
 
 This release introduces many new features, improvements and bug fixes. The API
-has been improved for a better consistency and 1.4 deprecated methods were
-removed. Now Java 1.8 is required.
+has been improved for a better consistency and many deprecated methods were
+removed. Java 1.8 is required.
 
 Additionally the release contains the following noteworthy changes:
 
-- Name Finder evaluation can now show a confusion matrix
-- The default evaluation output contains more details
-- Added a Language Model CLI tool
-- Add Moses format support
-- More refactoring and cleanup, specially in Machine Learning package and Dictionary
-- Removed deprecated trainers from UIMA integration
-- Fixed potential localization issues and added maven plugin to prevent it (ForbiddenAPI)
-- Fixed issues with the BRAT corpus reader
-- Deprecated GIS class, will be removed in a future 1.8.x release
+- POS Tagger context generator now supports feature generation XML
+- Add a Name Finder feature generator that adds POS Tag features
+- Add CONLL-U format support
+- Improve default Name Finder settings
+- TokenNameFinderEvaluator CLI now support nameTypes argument
+- Stupid backoff is now the default in NGramLanguageModel
+- Language codes now are ISO 639-3 compliant
+- Add many unit tests
+- Distribution package now includes example parameters file
+- Now prefix and suffix feature generators are configurable
+- Remove API in Document Categorizer for user specified tokenizer
+- Learnable lemmatizer now returns all possible lemmas for a given word and pos tag
+- Add stemmer, detokenizer and sentence detection abbreviations for Irish
+- Chunker SequenceValidator signature changed to allow access to both token and POS tag
 
 A detailed list of the issues related to this release can be found in the release
 notes.
-
-

http://git-wip-us.apache.org/repos/asf/opennlp/blob/db9c511e/opennlp-docs/src/docbkx/cli.xml
----------------------------------------------------------------------
diff --git a/opennlp-docs/src/docbkx/cli.xml b/opennlp-docs/src/docbkx/cli.xml
index 3dc66b7..1a8c326 100644
--- a/opennlp-docs/src/docbkx/cli.xml
+++ b/opennlp-docs/src/docbkx/cli.xml
@@ -42,7 +42,7 @@ under the License.
 
 <title>Doccat</title>
 
-<para>Learnable document categorizer</para>
+<para>Learned document categorizer</para>
 
 <screen>
 <![CDATA[
@@ -60,15 +60,15 @@ Usage: opennlp Doccat model < documents
 
 <screen>
 <![CDATA[
-Usage: opennlp DoccatTrainer[.leipzig] [-factory factoryName] [-tokenizer tokenizer] [-featureGenerators fg] 
+Usage: opennlp DoccatTrainer[.leipzig] [-factory factoryName] [-featureGenerators fg] [-tokenizer tokenizer] 
         [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
 		A sub-class of DoccatFactory where to get implementation and resources.
-	-tokenizer tokenizer
-		Tokenizer implementation. WhitespaceTokenizer is used if not specified.
 	-featureGenerators fg
 		Comma separated feature generator classes. Bag of words is used if not specified.
+	-tokenizer tokenizer
+		Tokenizer implementation. WhitespaceTokenizer is used if not specified.
 	-params paramsFile
 		training parameters file.
 	-lang language
@@ -113,13 +113,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp DoccatEvaluator[.leipzig] [-misclassified true|false] -model model [-reportOutputFile 
+Usage: opennlp DoccatEvaluator[.leipzig] -model model [-misclassified true|false] [-reportOutputFile 
         outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-reportOutputFile outputFile
 		the path of the fine-grained report file.
 	-data sampleData
@@ -160,20 +160,20 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp DoccatCrossValidator[.leipzig] [-folds num] [-misclassified true|false] [-factory factoryName] 
-        [-tokenizer tokenizer] [-featureGenerators fg] [-params paramsFile] -lang language [-reportOutputFile 
+Usage: opennlp DoccatCrossValidator[.leipzig] [-misclassified true|false] [-folds num] [-factory factoryName] 
+        [-featureGenerators fg] [-tokenizer tokenizer] [-params paramsFile] -lang language [-reportOutputFile 
         outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-factory factoryName
 		A sub-class of DoccatFactory where to get implementation and resources.
-	-tokenizer tokenizer
-		Tokenizer implementation. WhitespaceTokenizer is used if not specified.
 	-featureGenerators fg
 		Comma separated feature generator classes. Bag of words is used if not specified.
+	-tokenizer tokenizer
+		Tokenizer implementation. WhitespaceTokenizer is used if not specified.
 	-params paramsFile
 		training parameters file.
 	-lang language
@@ -351,18 +351,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -463,13 +463,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp TokenizerMEEvaluator[.ad|.pos|.conllx|.namefinder|.parse] [-misclassified true|false] -model 
-        model -data sampleData [-encoding charsetName] 
+Usage: opennlp TokenizerMEEvaluator[.ad|.pos|.conllx|.namefinder|.parse] -model model [-misclassified 
+        true|false] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -490,18 +490,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -602,14 +602,14 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp TokenizerCrossValidator[.ad|.pos|.conllx|.namefinder|.parse] [-folds num] [-misclassified 
-        true|false] [-factory factoryName] [-abbDict path] [-alphaNumOpt isAlphaNumOpt] [-params paramsFile] 
+Usage: opennlp TokenizerCrossValidator[.ad|.pos|.conllx|.namefinder|.parse] [-misclassified true|false] 
+        [-folds num] [-factory factoryName] [-abbDict path] [-alphaNumOpt isAlphaNumOpt] [-params paramsFile] 
         -lang language -data sampleData [-encoding charsetName] 
 Arguments description:
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-factory factoryName
 		A sub-class of TokenizerFactory where to get implementation and resources.
 	-abbDict path
@@ -640,18 +640,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -769,18 +769,18 @@ Usage: opennlp TokenizerConverter help|ad|pos|conllx|namefinder|parse [help|opti
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -916,15 +916,15 @@ Usage: opennlp SentenceDetector model < sentences
 <screen>
 <![CDATA[
 Usage: opennlp SentenceDetectorTrainer[.ad|.pos|.conllx|.namefinder|.parse|.moses|.letsmt] [-factory 
-        factoryName] [-eosChars string] [-abbDict path] [-params paramsFile] -lang language -model modelFile 
+        factoryName] [-abbDict path] [-eosChars string] [-params paramsFile] -lang language -model modelFile 
         -data sampleData [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
 		A sub-class of SentenceDetectorFactory where to get implementation and resources.
-	-eosChars string
-		EOS characters.
 	-abbDict path
 		abbreviation dictionary in XML format.
+	-eosChars string
+		EOS characters.
 	-params paramsFile
 		training parameters file.
 	-lang language
@@ -951,18 +951,18 @@ Arguments description:
 <entry>Encoding for reading and writing text.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>includeTitles</entry>
 <entry>includeTitles</entry>
 <entry>Yes</entry>
 <entry>If true will include sentences marked as headlines.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1089,13 +1089,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp SentenceDetectorEvaluator[.ad|.pos|.conllx|.namefinder|.parse|.moses|.letsmt] [-misclassified 
-        true|false] -model model -data sampleData [-encoding charsetName] 
+Usage: opennlp SentenceDetectorEvaluator[.ad|.pos|.conllx|.namefinder|.parse|.moses|.letsmt] -model model 
+        [-misclassified true|false] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -1116,18 +1116,18 @@ Arguments description:
 <entry>Encoding for reading and writing text.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>includeTitles</entry>
 <entry>includeTitles</entry>
 <entry>Yes</entry>
 <entry>If true will include sentences marked as headlines.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1255,23 +1255,23 @@ Arguments description:
 <screen>
 <![CDATA[
 Usage: opennlp SentenceDetectorCrossValidator[.ad|.pos|.conllx|.namefinder|.parse|.moses|.letsmt] [-factory 
-        factoryName] [-eosChars string] [-abbDict path] [-params paramsFile] -lang language [-folds num] 
-        [-misclassified true|false] -data sampleData [-encoding charsetName] 
+        factoryName] [-abbDict path] [-eosChars string] [-params paramsFile] -lang language [-misclassified 
+        true|false] [-folds num] -data sampleData [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
 		A sub-class of SentenceDetectorFactory where to get implementation and resources.
-	-eosChars string
-		EOS characters.
 	-abbDict path
 		abbreviation dictionary in XML format.
+	-eosChars string
+		EOS characters.
 	-params paramsFile
 		training parameters file.
 	-lang language
 		language which is being processed.
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -1292,18 +1292,18 @@ Arguments description:
 <entry>Encoding for reading and writing text.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>includeTitles</entry>
 <entry>includeTitles</entry>
 <entry>Yes</entry>
 <entry>If true will include sentences marked as headlines.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1447,18 +1447,18 @@ Usage: opennlp SentenceDetectorConverter help|ad|pos|conllx|namefinder|parse|mos
 <entry>Encoding for reading and writing text.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>includeTitles</entry>
 <entry>includeTitles</entry>
 <entry>Yes</entry>
 <entry>If true will include sentences marked as headlines.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1642,14 +1642,14 @@ Arguments description:
 <tbody>
 <row>
 <entry morerows='3' valign='middle'>evalita</entry>
-<entry>lang</entry>
-<entry>it</entry>
+<entry>types</entry>
+<entry>per,loc,org,gpe</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,gpe</entry>
+<entry>lang</entry>
+<entry>it</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -1673,18 +1673,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1692,14 +1692,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll03</entry>
-<entry>lang</entry>
-<entry>en|de</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>eng|deu</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -1736,14 +1736,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll02</entry>
-<entry>lang</entry>
-<entry>es|nl</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>es|nl</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -1836,17 +1836,17 @@ Arguments description:
 <screen>
 <![CDATA[
 Usage: opennlp TokenNameFinderEvaluator[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] 
-        [-nameTypes types] [-misclassified true|false] -model model [-detailedF true|false] 
+        [-nameTypes types] -model model [-misclassified true|false] [-detailedF true|false] 
         [-reportOutputFile outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
 	-nameTypes types
 		name types to use for evaluation
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-detailedF true|false
-		if true will print detailed FMeasure results.
+		if true (default) will print detailed FMeasure results.
 	-reportOutputFile outputFile
 		the path of the fine-grained report file.
 	-data sampleData
@@ -1863,14 +1863,14 @@ Arguments description:
 <tbody>
 <row>
 <entry morerows='3' valign='middle'>evalita</entry>
-<entry>lang</entry>
-<entry>it</entry>
+<entry>types</entry>
+<entry>per,loc,org,gpe</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,gpe</entry>
+<entry>lang</entry>
+<entry>it</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -1894,18 +1894,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -1913,14 +1913,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll03</entry>
-<entry>lang</entry>
-<entry>en|de</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>eng|deu</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -1957,14 +1957,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll02</entry>
-<entry>lang</entry>
-<entry>es|nl</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>es|nl</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2059,8 +2059,8 @@ Arguments description:
 Usage: opennlp 
         TokenNameFinderCrossValidator[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] 
         [-factory factoryName] [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] 
-        [-nameTypes types] [-sequenceCodec codec] [-params paramsFile] -lang language [-folds num] 
-        [-misclassified true|false] [-detailedF true|false] [-reportOutputFile outputFile] -data sampleData 
+        [-nameTypes types] [-sequenceCodec codec] [-params paramsFile] -lang language [-misclassified 
+        true|false] [-folds num] [-detailedF true|false] [-reportOutputFile outputFile] -data sampleData 
         [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
@@ -2079,12 +2079,12 @@ Arguments description:
 		training parameters file.
 	-lang language
 		language which is being processed.
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-detailedF true|false
-		if true will print detailed FMeasure results.
+		if true (default) will print detailed FMeasure results.
 	-reportOutputFile outputFile
 		the path of the fine-grained report file.
 	-data sampleData
@@ -2101,14 +2101,14 @@ Arguments description:
 <tbody>
 <row>
 <entry morerows='3' valign='middle'>evalita</entry>
-<entry>lang</entry>
-<entry>it</entry>
+<entry>types</entry>
+<entry>per,loc,org,gpe</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,gpe</entry>
+<entry>lang</entry>
+<entry>it</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2132,18 +2132,18 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2151,14 +2151,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll03</entry>
-<entry>lang</entry>
-<entry>en|de</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>eng|deu</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2195,14 +2195,14 @@ Arguments description:
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll02</entry>
-<entry>lang</entry>
-<entry>es|nl</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>es|nl</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2305,14 +2305,14 @@ Usage: opennlp TokenNameFinderConverter help|evalita|ad|conll03|bionlp2004|conll
 <tbody>
 <row>
 <entry morerows='3' valign='middle'>evalita</entry>
-<entry>lang</entry>
-<entry>it</entry>
+<entry>types</entry>
+<entry>per,loc,org,gpe</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,gpe</entry>
+<entry>lang</entry>
+<entry>it</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2336,18 +2336,18 @@ Usage: opennlp TokenNameFinderConverter help|evalita|ad|conll03|bionlp2004|conll
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>splitHyphenatedTokens</entry>
 <entry>split</entry>
 <entry>Yes</entry>
 <entry>If true all hyphenated tokens will be separated (default true)</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2355,14 +2355,14 @@ Usage: opennlp TokenNameFinderConverter help|evalita|ad|conll03|bionlp2004|conll
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll03</entry>
-<entry>lang</entry>
-<entry>en|de</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>eng|deu</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2399,14 +2399,14 @@ Usage: opennlp TokenNameFinderConverter help|evalita|ad|conll03|bionlp2004|conll
 </row>
 <row>
 <entry morerows='3' valign='middle'>conll02</entry>
-<entry>lang</entry>
-<entry>es|nl</entry>
+<entry>types</entry>
+<entry>per,loc,org,misc</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
 <row>
-<entry>types</entry>
-<entry>per,loc,org,misc</entry>
+<entry>lang</entry>
+<entry>es|nl</entry>
 <entry>No</entry>
 <entry></entry>
 </row>
@@ -2498,13 +2498,13 @@ Usage: opennlp TokenNameFinderConverter help|evalita|ad|conll03|bionlp2004|conll
 
 <screen>
 <![CDATA[
-Usage: opennlp CensusDictionaryCreator [-encoding charsetName] [-lang code] -dict dict -censusData censusDict
+Usage: opennlp CensusDictionaryCreator [-encoding charsetName] [-lang code] -censusData censusDict -dict dict
 
 Arguments description:
 	-encoding charsetName
 	-lang code
-	-dict dict
 	-censusData censusDict
+	-dict dict
 
 ]]>
 </screen> 
@@ -2538,19 +2538,18 @@ Usage: opennlp POSTagger model < sentences
 
 <screen>
 <![CDATA[
-Usage: opennlp POSTaggerTrainer[.ad|.conllx|.parse|.ontonotes] [-factory factoryName] [-type 
-        maxent|perceptron|perceptron_sequence] [-dict dictionaryPath] [-ngram cutoff] [-tagDictCutoff 
-        tagDictCutoff] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding 
-        charsetName] 
+Usage: opennlp POSTaggerTrainer[.ad|.conllx|.parse|.ontonotes|.conllu] [-factory factoryName] [-resources 
+        resourcesDir] [-featuregen featuregenFile] [-dict dictionaryPath] [-tagDictCutoff tagDictCutoff] 
+        [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
 		A sub-class of POSTaggerFactory where to get implementation and resources.
-	-type maxent|perceptron|perceptron_sequence
-		The type of the token name finder model. One of maxent|perceptron|perceptron_sequence.
+	-resources resourcesDir
+		The resources directory
+	-featuregen featuregenFile
+		The feature generator descriptor file
 	-dict dictionaryPath
 		The XML tag dictionary file
-	-ngram cutoff
-		NGram cutoff. If not specified will not create ngram dictionary.
 	-tagDictCutoff tagDictCutoff
 		TagDictionary cutoff. If specified will create/expand a mutable TagDictionary
 	-params paramsFile
@@ -2579,12 +2578,6 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>expandME</entry>
 <entry>expandME</entry>
 <entry>Yes</entry>
@@ -2597,6 +2590,12 @@ Arguments description:
 <entry>Combine POS Tags with word features, like number and gender.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2635,6 +2634,25 @@ Arguments description:
 <entry>No</entry>
 <entry></entry>
 </row>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -2648,13 +2666,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp POSTaggerEvaluator[.ad|.conllx|.parse|.ontonotes] [-misclassified true|false] -model model 
-        [-reportOutputFile outputFile] -data sampleData [-encoding charsetName] 
+Usage: opennlp POSTaggerEvaluator[.ad|.conllx|.parse|.ontonotes|.conllu] -model model [-misclassified 
+        true|false] [-reportOutputFile outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-reportOutputFile outputFile
 		the path of the fine-grained report file.
 	-data sampleData
@@ -2677,12 +2695,6 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>expandME</entry>
 <entry>expandME</entry>
 <entry>Yes</entry>
@@ -2695,6 +2707,12 @@ Arguments description:
 <entry>Combine POS Tags with word features, like number and gender.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2733,6 +2751,25 @@ Arguments description:
 <entry>No</entry>
 <entry></entry>
 </row>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -2746,23 +2783,23 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp POSTaggerCrossValidator[.ad|.conllx|.parse|.ontonotes] [-folds num] [-misclassified 
-        true|false] [-factory factoryName] [-type maxent|perceptron|perceptron_sequence] [-dict 
-        dictionaryPath] [-ngram cutoff] [-tagDictCutoff tagDictCutoff] [-params paramsFile] -lang language 
-        [-reportOutputFile outputFile] -data sampleData [-encoding charsetName] 
+Usage: opennlp POSTaggerCrossValidator[.ad|.conllx|.parse|.ontonotes|.conllu] [-misclassified true|false] 
+        [-folds num] [-factory factoryName] [-resources resourcesDir] [-featuregen featuregenFile] [-dict 
+        dictionaryPath] [-tagDictCutoff tagDictCutoff] [-params paramsFile] -lang language [-reportOutputFile 
+        outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-factory factoryName
 		A sub-class of POSTaggerFactory where to get implementation and resources.
-	-type maxent|perceptron|perceptron_sequence
-		The type of the token name finder model. One of maxent|perceptron|perceptron_sequence.
+	-resources resourcesDir
+		The resources directory
+	-featuregen featuregenFile
+		The feature generator descriptor file
 	-dict dictionaryPath
 		The XML tag dictionary file
-	-ngram cutoff
-		NGram cutoff. If not specified will not create ngram dictionary.
 	-tagDictCutoff tagDictCutoff
 		TagDictionary cutoff. If specified will create/expand a mutable TagDictionary
 	-params paramsFile
@@ -2791,12 +2828,6 @@ Arguments description:
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>expandME</entry>
 <entry>expandME</entry>
 <entry>Yes</entry>
@@ -2809,6 +2840,12 @@ Arguments description:
 <entry>Combine POS Tags with word features, like number and gender.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2847,6 +2884,25 @@ Arguments description:
 <entry>No</entry>
 <entry></entry>
 </row>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -2856,11 +2912,11 @@ Arguments description:
 
 <title>POSTaggerConverter</title>
 
-<para>Converts foreign data formats (ad,conllx,parse,ontonotes) to native OpenNLP format</para>
+<para>Converts foreign data formats (ad,conllx,parse,ontonotes,conllu) to native OpenNLP format</para>
 
 <screen>
 <![CDATA[
-Usage: opennlp POSTaggerConverter help|ad|conllx|parse|ontonotes [help|options...]
+Usage: opennlp POSTaggerConverter help|ad|conllx|parse|ontonotes|conllu [help|options...]
 
 ]]>
 </screen> 
@@ -2877,12 +2933,6 @@ Usage: opennlp POSTaggerConverter help|ad|conllx|parse|ontonotes [help|options..
 <entry>Encoding for reading and writing text, if absent the system default is used.</entry>
 </row>
 <row>
-<entry>lang</entry>
-<entry>language</entry>
-<entry>No</entry>
-<entry>Language which is being processed.</entry>
-</row>
-<row>
 <entry>expandME</entry>
 <entry>expandME</entry>
 <entry>Yes</entry>
@@ -2895,6 +2945,12 @@ Usage: opennlp POSTaggerConverter help|ad|conllx|parse|ontonotes [help|options..
 <entry>Combine POS Tags with word features, like number and gender.</entry>
 </row>
 <row>
+<entry>lang</entry>
+<entry>language</entry>
+<entry>No</entry>
+<entry>Language which is being processed.</entry>
+</row>
+<row>
 <entry>data</entry>
 <entry>sampleData</entry>
 <entry>No</entry>
@@ -2933,6 +2989,25 @@ Usage: opennlp POSTaggerConverter help|ad|conllx|parse|ontonotes [help|options..
 <entry>No</entry>
 <entry></entry>
 </row>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -2966,7 +3041,7 @@ Usage: opennlp LemmatizerME model < sentences
 
 <screen>
 <![CDATA[
-Usage: opennlp LemmatizerTrainerME [-factory factoryName] [-params paramsFile] -lang language -model 
+Usage: opennlp LemmatizerTrainerME[.conllu] [-factory factoryName] [-params paramsFile] -lang language -model 
         modelFile -data sampleData [-encoding charsetName] 
 Arguments description:
 	-factory factoryName
@@ -2989,6 +3064,25 @@ Arguments description:
 <informaltable frame='all'><tgroup cols='4' align='left' colsep='1' rowsep='1'>
 <thead><row><entry>Format</entry><entry>Argument</entry><entry>Value</entry><entry>Optional</entry><entry>Description</entry></row></thead>
 <tbody>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -3002,13 +3096,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp LemmatizerEvaluator [-misclassified true|false] -model model [-reportOutputFile outputFile] 
-        -data sampleData [-encoding charsetName] 
+Usage: opennlp LemmatizerEvaluator[.conllu] -model model [-misclassified true|false] [-reportOutputFile 
+        outputFile] -data sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-reportOutputFile outputFile
 		the path of the fine-grained report file.
 	-data sampleData
@@ -3023,6 +3117,25 @@ Arguments description:
 <informaltable frame='all'><tgroup cols='4' align='left' colsep='1' rowsep='1'>
 <thead><row><entry>Format</entry><entry>Argument</entry><entry>Value</entry><entry>Optional</entry><entry>Description</entry></row></thead>
 <tbody>
+<row>
+<entry morerows='2' valign='middle'>conllu</entry>
+<entry>tagset</entry>
+<entry>tagset</entry>
+<entry>Yes</entry>
+<entry>U|x u for unified tags and x for language-specific part-of-speech tags</entry>
+</row>
+<row>
+<entry>data</entry>
+<entry>sampleData</entry>
+<entry>No</entry>
+<entry>Data to be used, usually a file name.</entry>
+</row>
+<row>
+<entry>encoding</entry>
+<entry>charsetName</entry>
+<entry>Yes</entry>
+<entry>Encoding for reading and writing text, if absent the system default is used.</entry>
+</row>
 </tbody>
 </tgroup></informaltable>
 
@@ -3123,15 +3236,15 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp ChunkerEvaluator[.ad] [-misclassified true|false] -model model [-detailedF true|false] -data 
+Usage: opennlp ChunkerEvaluator[.ad] -model model [-misclassified true|false] [-detailedF true|false] -data 
         sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-detailedF true|false
-		if true will print detailed FMeasure results.
+		if true (default) will print detailed FMeasure results.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -3188,8 +3301,9 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp ChunkerCrossValidator[.ad] [-factory factoryName] [-params paramsFile] -lang language [-folds 
-        num] [-misclassified true|false] [-detailedF true|false] -data sampleData [-encoding charsetName] 
+Usage: opennlp ChunkerCrossValidator[.ad] [-factory factoryName] [-params paramsFile] -lang language 
+        [-misclassified true|false] [-folds num] [-detailedF true|false] -data sampleData [-encoding 
+        charsetName] 
 Arguments description:
 	-factory factoryName
 		A sub-class of ChunkerFactory where to get implementation and resources.
@@ -3197,12 +3311,12 @@ Arguments description:
 		training parameters file.
 	-lang language
 		language which is being processed.
-	-folds num
-		number of folds, default is 10.
 	-misclassified true|false
 		if true will print false negatives and false positives.
+	-folds num
+		number of folds, default is 10.
 	-detailedF true|false
-		if true will print detailed FMeasure results.
+		if true (default) will print detailed FMeasure results.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -3399,13 +3513,13 @@ Arguments description:
 
 <screen>
 <![CDATA[
-Usage: opennlp ParserEvaluator[.ontonotes|.frenchtreebank] [-misclassified true|false] -model model -data 
+Usage: opennlp ParserEvaluator[.ontonotes|.frenchtreebank] -model model [-misclassified true|false] -data 
         sampleData [-encoding charsetName] 
 Arguments description:
-	-misclassified true|false
-		if true will print false negatives and false positives.
 	-model model
 		the model file to be evaluated.
+	-misclassified true|false
+		if true will print false negatives and false positives.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
@@ -3633,15 +3747,15 @@ Usage: opennlp EntityLinker model < sentences
 
 <title>Languagemodel</title>
 
-<section id='tools.cli.languagemodel.LanguageModel'>
+<section id='tools.cli.languagemodel.NGramLanguageModel'>
 
-<title>LanguageModel</title>
+<title>NGramLanguageModel</title>
 
-<para>Gives the probability of a sequence of tokens in a language model</para>
+<para>Gives the probability and most probable next token(s) of a sequence of tokens in a language model</para>
 
 <screen>
 <![CDATA[
-Usage: opennlp LanguageModel model
+Usage: opennlp NGramLanguageModel model
 
 ]]>
 </screen>