You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/05/31 12:51:40 UTC

svn commit: r1129615 - /incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml

Author: joern
Date: Tue May 31 10:51:39 2011
New Revision: 1129615

URL: http://svn.apache.org/viewvc?rev=1129615&view=rev
Log:
OPENNLP-194 Fixed too long lines

Modified:
    incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml

Modified: incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml
URL: http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml?rev=1129615&r1=1129614&r2=1129615&view=diff
==============================================================================
--- incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml (original)
+++ incubator/opennlp/trunk/opennlp-docs/src/docbkx/tokenizer.xml Tue May 31 10:51:39 2011
@@ -28,7 +28,8 @@
 			<![CDATA[
 Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
 Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group.
-Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate.
+Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields
+    PLC, was named a director of this British industrial conglomerate.
 			]]>
 		 </programlisting>
 
@@ -39,8 +40,11 @@ Rudolph Agnew, 55 years old and former c
 			<![CDATA[
 Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
 Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
-Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a nonexecutive director of this British industrial conglomerate . 
-A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago , researchers reported . 
+Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC ,
+    was named a nonexecutive director of this British industrial conglomerate . 
+A form of asbestos once used to make Kent cigarette filters has caused a high
+    percentage of cancer deaths among a group of workers exposed to it more than 30 years ago ,
+    researchers reported . 
 			]]>
 		 	</programlisting>
 
@@ -127,7 +131,8 @@ Showa Shell gained 20 to 1,570 and Mitsu
 Sumitomo Metal Mining fell five yen to 692 and Nippon Mining added 15 to 960 .
 Among other winners Wednesday was Nippon Shokubai , which was up 80 at 2,410 .
 Marubeni advanced 11 to 890 .
-London share prices were bolstered largely by continued gains on Wall Street and technical factors affecting demand for London 's blue-chip stocks .
+London share prices were bolstered largely by continued gains on Wall Street and technical 
+    factors affecting demand for London 's blue-chip stocks .
 ...etc...]]>
 		 </screen>
 			Of course this is all on the command line. Many people use the models
@@ -230,14 +235,16 @@ double tokenProbs[] = tokenizer.getToken
 			<![CDATA[
 Pierre Vinken<SPLIT>, 61 years old<SPLIT>, will join the board as a nonexecutive director Nov. 29<SPLIT>.
 Mr. Vinken is chairman of Elsevier N.V.<SPLIT>, the Dutch publishing group<SPLIT>.
-Rudolph Agnew<SPLIT>, 55 years old and former chairman of Consolidated Gold Fields PLC<SPLIT>, was named a nonexecutive director of this British industrial conglomerate<SPLIT>. 
+Rudolph Agnew<SPLIT>, 55 years old and former chairman of Consolidated Gold Fields PLC<SPLIT>,
+    was named a nonexecutive director of this British industrial conglomerate<SPLIT>. 
 			]]>		
 			</programlisting>
 			Usage of the tool:
 			<screen>
 			<![CDATA[
 $ bin/opennlp TokenizerTrainer
-Usage: opennlp TokenizerTrainer-lang language -encoding charset [-iterations num] [-cutoff num] [-alphaNumOpt] -data trainingData -model model
+Usage: opennlp TokenizerTrainer-lang language -encoding charset [-iterations num] \ 
+[-cutoff num] [-alphaNumOpt] -data trainingData -model model
 -lang language     specifies the language which is being processed.
 -encoding charset  specifies the encoding which should be used for reading and writing text.
 -iterations num    specified the number of training iterations
@@ -248,7 +255,8 @@ Usage: opennlp TokenizerTrainer-lang lan
 			To train the english tokenizer use the following command:
 			<screen>
 			<![CDATA[
-$ bin/opennlp TokenizerTrainer -encoding UTF-8 -lang en -alphaNumOpt -data en-token.train -model en-token.bin
+$ bin/opennlp TokenizerTrainer -encoding UTF-8 -lang en -alphaNumOpt \ 
++-data en-token.train -model en-token.bin
 Indexing events using cutoff of 5
 
 	Computing event counts...  done. 262271 events