You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by GitBox <gi...@apache.org> on 2022/10/18 14:38:41 UTC

[GitHub] [opennlp] atarora opened a new pull request, #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

atarora opened a new pull request, #426:
URL: https://github.com/apache/opennlp/pull/426

   Thank you for contributing to Apache OpenNLP.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
   - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
   
   ### For documentation related changes:
   - [x] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ![Screenshot 2022-10-18 at 16 37 22](https://user-images.githubusercontent.com/5163715/196461503-ca924a44-43d6-45bf-ae1b-bc4f2d17c578.png)
   
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] atarora commented on pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
atarora commented on PR #426:
URL: https://github.com/apache/opennlp/pull/426#issuecomment-1290592612

   Updated Doc to reflect it as below per suggestions , 
   <img width="1056" alt="Screenshot 2022-10-25 at 15 44 23" src="https://user-images.githubusercontent.com/5163715/197791037-9eed444a-7aa5-4b76-b20c-864d1cf6a50e.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] jzonthemtn merged pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
jzonthemtn merged PR #426:
URL: https://github.com/apache/opennlp/pull/426


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] jzonthemtn commented on pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
jzonthemtn commented on PR #426:
URL: https://github.com/apache/opennlp/pull/426#issuecomment-1290734751

   Thanks @atarora! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] jzonthemtn commented on a diff in pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
jzonthemtn commented on code in PR #426:
URL: https://github.com/apache/opennlp/pull/426#discussion_r1003288846


##########
opennlp-docs/src/docbkx/tokenizer.xml:
##########
@@ -258,37 +258,39 @@ Arguments description:
 				To train the english tokenizer use the following command:
 				<screen>
 			    <![CDATA[
-$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt -lang en -data en-token.train -encoding UTF-8
+$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt isAlphaNumOpt -lang en -data en-token.train -encoding UTF-8

Review Comment:
   Looking at this again, should this be `true` instead of `isAlphaNumOpt`? Like:
   
   ```
   $ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt true -lang en -data en-token.train -encoding UTF-8
   ```
   
   Looking at `TrainingParams.java`, `getAlphaNumOpt()` returns a `Boolean` and defaults to `false`. Since it defaults to `false`, I guess it makes sense for the example to be `true`...?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] atarora commented on a diff in pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
atarora commented on code in PR #426:
URL: https://github.com/apache/opennlp/pull/426#discussion_r1004275029


##########
opennlp-docs/src/docbkx/tokenizer.xml:
##########
@@ -258,37 +258,39 @@ Arguments description:
 				To train the english tokenizer use the following command:
 				<screen>
 			    <![CDATA[
-$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt -lang en -data en-token.train -encoding UTF-8
+$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt isAlphaNumOpt -lang en -data en-token.train -encoding UTF-8

Review Comment:
   Thank you for taking a notice to this @jzonthemtn , I certainly see the difference :
   
   `$opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt true -lang en -data en-token.train -encoding UTF-8 -cutoff 5
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 45 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 45 events to 25.
   Done indexing in 0,09 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 25
   	    Number of Outcomes: 2
   	  Number of Predicates: 18
   ...done.
   
   
   
   $opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt false -lang en -data en-token.train -encoding UTF-8 -cutoff 5
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 212 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 212 events to 171.
   Done indexing in 0,12 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 171
   	    Number of Outcomes: 2
   	  Number of Predicates: 75
   ...done.`
   
   Worth updating the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] jzonthemtn commented on pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
jzonthemtn commented on PR #426:
URL: https://github.com/apache/opennlp/pull/426#issuecomment-1282663467

   Thanks @atarora! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] atarora commented on a diff in pull request #426: OPENNLP-1387 : Fix alphaNumOpt in tokenizer example

Posted by GitBox <gi...@apache.org>.
atarora commented on code in PR #426:
URL: https://github.com/apache/opennlp/pull/426#discussion_r1004275029


##########
opennlp-docs/src/docbkx/tokenizer.xml:
##########
@@ -258,37 +258,39 @@ Arguments description:
 				To train the english tokenizer use the following command:
 				<screen>
 			    <![CDATA[
-$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt -lang en -data en-token.train -encoding UTF-8
+$ opennlp TokenizerTrainer -model en-token.bin -alphaNumOpt isAlphaNumOpt -lang en -data en-token.train -encoding UTF-8

Review Comment:
   Thank you for taking a notice to this @jzonthemtn , I certainly see the difference :
   
   **$opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt true -lang en -data en-token.train -encoding UTF-8 -cutoff 5**
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 45 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 45 events to 25.
   Done indexing in 0,09 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 25
   	    Number of Outcomes: 2
   	  Number of Predicates: 18
   ...done.
   
   
   
   **$opennlp TokenizerTrainer -model en-token-test.bin -alphaNumOpt false -lang en -data en-token.train -encoding UTF-8 -cutoff 5**
   Indexing events with TwoPass using cutoff of 5
   
   	Computing event counts...  done. 212 events
   	Indexing...  done.
   Sorting and merging events... done. Reduced 212 events to 171.
   Done indexing in 0,12 s.
   Incorporating indexed data for training...
   done.
   	Number of Event Tokens: 171
   	    Number of Outcomes: 2
   	  Number of Predicates: 75
   ...done.
   
   Worth updating the doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org