You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by GitBox <gi...@apache.org> on 2022/10/25 10:08:01 UTC

[GitHub] [opennlp] atarora commented on a diff in pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

atarora commented on code in PR #426:
URL: https://github.com/apache/opennlp/pull/426#discussion_r1004275029


##########
opennlp-docs/src/docbkx/tokenizer.xml:
##########
@@ -258,37 +258,39 @@ Arguments description:
 				To train the english tokenizer use the following command:
 				<screen>
 			    <![CDATA[
-$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt -lang en -data en-token.train -encoding UTF-8
+$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt isAlphaNumOpt -lang en -data en-token.train -encoding UTF-8

Review Comment:
   Thank you for taking a notice to this @jzonthemtn , I certainly see the difference :
   
   `$opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt true -lang en -data en-token.train -encoding UTF-8 -cutoff 5
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 45 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 45 events to 25.
   Done indexing in 0,09 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 25
   	    Number of Outcomes: 2
   	  Number of Predicates: 18
   ...done.
   
   
   
   $opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt false -lang en -data en-token.train -encoding UTF-8 -cutoff 5
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 212 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 212 events to 171.
   Done indexing in 0,12 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 171
   	    Number of Outcomes: 2
   	  Number of Predicates: 75
   ...done.`
   
   Worth updating the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org