You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by GitBox <gi...@apache.org> on 2022/12/04 14:51:04 UTC

[GitHub] [opennlp] mawiesne opened a new pull request, #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

mawiesne opened a new pull request, #446:
URL: https://github.com/apache/opennlp/pull/446

   Change
   -
   - adds missing JavaDoc
   - improves existing documentation for clarity
   - removes superfluous text
   - adds 'final' modifier where useful and applicable
   - adds 'Override' annotation where useful and applicable
   - sanitizes some unused, deprecated code fragments, mostly in `POSTaggerFactory`
   - reduced visibility of some methods in `POSTaggerFactory` as those were only used 'internally'
   - fixes some typos
   - cures open comment from OPENNLP-1403 by reviewer 'kinow' in `DefaultLanguageDetectorContextGenerator`
   
   Tasks
   -
   Thank you for contributing to Apache OpenNLP.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
   - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
   
   ### For documentation related changes:
   - [x] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] rzo1 commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
rzo1 commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038998936


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   > +1 to having it well documented so devs/committers are aware of what can be done or not.
   
   +1 -  we need some clear guideance / documentation. People need to know what to expect if they are using deprecated things. Judging from the cleanup process, we deprecated things (e.g. via a Javadoc comment) but never get down the road to actually cleanup or remove them ;-)
   
   > Sure. IMHO, it's important to define under which circumstances deprecation is tolerable, or in other words how to evolve an API. I've seen spots in the OpenNLP code where only a JavaDoc comment signals deprecation, so the compiler wouldn't warn devs (external or "homies").
   
   I am +1 with @mawiesne  - the best thing would be, that we find some consent on how to deal with deprecated things between releases and document it, so we have some written consent, which isn't forgotten or only in the "heads" :-)
   
   That said, I really appreciate @mawiesne effort to increase the readability & clarity of the code! :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038990369


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -95,8 +83,19 @@ public POSTaggerFactory(byte[] featureGeneratorBytes, final Map<String, Object>
     this.posDictionary = posDictionary;
   }
 
+
+  // reduced visibility to ensure deprecation is respected in future versions
+  @Deprecated
+  POSTaggerFactory(Dictionary ngramDictionary, TagDictionary posDictionary) {
+    this.init(ngramDictionary, posDictionary);
+
+    // TODO: This could be made functional by creating some default feature generation
+    // which uses the dictionary ...

Review Comment:
   This comment was there previously. Maybe, I just moved it. So no actual change intended here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] jzonthemtn merged pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
jzonthemtn merged PR #446:
URL: https://github.com/apache/opennlp/pull/446


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038990245


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -225,49 +240,51 @@ public void setTagDictionary(TagDictionary dictionary) {
     this.posDictionary = dictionary;
   }
 
+  /**
+   * @return The key-value based resources map, or an empty map.
+   */
   protected Map<String, Object> getResources() {
-
-
     if (resources != null) {
       return resources;
     }
 
     return Collections.emptyMap();
   }
 
+  /**
+   * @return The feature generator bytes used.
+   */
   protected byte[] getFeatureGenerator() {
     return featureGeneratorBytes;
   }
 
+  /**
+   * @return The {@link TagDictionary} used.
+   */
   public TagDictionary getTagDictionary() {
     if (this.posDictionary == null && artifactProvider != null)
       this.posDictionary = artifactProvider.getArtifact(TAG_DICTIONARY_ENTRY_NAME);
     return this.posDictionary;
   }
 
-  /**
-   * @deprecated this will be reduced in visibility and later removed
-   */
-  @Deprecated
-  public Dictionary getDictionary() {
+  @Deprecated // will be removed when only 8 series models are supported
+  private Dictionary getDictionary() {

Review Comment:
   See my comment before on how to document deprecated code. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] kinow commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
kinow commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038993301


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   Yeah, I know Commons really tries to be as backward compatible as possible, even avoiding updating the minimum JVM version. Jena also cares about breaking certain parts of the public API that are known/expected to be used in other systems.
   
   I don't know about OpenNLP though. +1 to having it well documented so devs/committers are aware of what can be done or not. Thanks for understanding it @mawiesne !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038990610


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTagger.java:
##########
@@ -26,14 +26,38 @@ public interface POSTagger {
 
   /**
    * Assigns the sentence of tokens pos tags.
+   *
    * @param sentence The sentence of tokens to be tagged.
-   * @return an array of pos tags for each token provided in sentence.
+   * @return An array of pos tags for each token provided in {@code sentence}.
    */
   String[] tag(String[] sentence);
 
+  /**
+   * Assigns the sentence of tokens pos tags.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   * @param additionalContext The context to provide additional information with.
+   *                          
+   * @return An array of pos tags for each token provided in {@code sentence}.
+   */
   String[] tag(String[] sentence, Object[] additionalContext);
 
+  /**
+   * Assigns the sentence the top-k {@link Sequence sequences}.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   *
+   * @return An array of {@link Sequence sequeneces} for each token provided in {@code sentence}.
+   */
   Sequence[] topKSequences(String[] sentence);
 
+  /**
+   * Assigns the sentence the top-k {@link Sequence sequences}.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   * @param additionalContext The context to provide additional information with.
+   *
+   * @return An array of {@link Sequence sequeneces} for each token provided in {@code sentence}.

Review Comment:
   Nice find. My 👀 seem to be (too) tired for today.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038989984


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   I've been thinking around at this line for a while. There are so many places that are deprecated _and_ unused. Worth documenting things that are both? Or remove or change the visibility so future releases mildly convince users to change their own code and to drop deprecated stuff that has been in this state for (many,) many years.
   
   Agreed that we need a policy on how to clean up or handle (internally unused!) Deprecated things.
   
   Same applies for all other spots I felt an adjustment makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038989984


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   I've been thinking around at this line for a while. There are so many places that are deprecated _and_ unused. Worth documenting things that are both? Or remove or change the visibility so future releases mildly switch users to change their code to drop deprecated stuff that has been in this state for many years.
   
   Agreed that we need a policy on how to clean up or handle (internally unused!) Deprecated things.
   
   Same applies for all other spots I felt an adjustment makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038994759


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   With @jzonthemtn in mind, I'm open for both directions for this PR here. Maybe, Jeff can quickly check if in this case the change in `POSModelFactory` can be done without hurting OpenNLP community (too) much. If it hurts: I'll try to go back and leave that "init..." part untouched.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #446:
URL: https://github.com/apache/opennlp/pull/446#issuecomment-1336443640

   🎯


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038994460


##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   Sure. IMHO, it's important to define under which circumstances deprecation is tolerable, or in other words how to evolve an API. I've seen spots in the OpenNLP code where only a JavaDoc comment signals deprecation, so the compiler wouldn't warn devs (external or "homies").



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] kinow commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
kinow commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038987138


##########
opennlp-tools/src/main/java/opennlp/tools/postag/WordTagSampleStream.java:
##########
@@ -58,6 +61,7 @@ public POSSample read() throws IOException {
       try {
         sample = POSSample.parse(sentence);
       } catch (InvalidFormatException e) {
+        // TODO: An exception in error case should be thrown.

Review Comment:
   :+1: 



##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws InvalidFormatException {
     if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
       throw new InvalidFormatException("NGram dictionary has wrong type!");
     }
-
   }
 
+  // reduced visibility to ensure deprecation is respected in future versions
   @Deprecated

Review Comment:
   We are more strict about changing visibility or removing public methods in Commons between non-major releases, but I think it should be fine for OpenNLP, as its API is not expected to be used by most users (i.e. most should be using the tools to train/infer with models instead). Even then, worth adding a note to the changelog about it IMO.



##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTagger.java:
##########
@@ -26,14 +26,38 @@ public interface POSTagger {
 
   /**
    * Assigns the sentence of tokens pos tags.
+   *
    * @param sentence The sentence of tokens to be tagged.
-   * @return an array of pos tags for each token provided in sentence.
+   * @return An array of pos tags for each token provided in {@code sentence}.
    */
   String[] tag(String[] sentence);
 
+  /**
+   * Assigns the sentence of tokens pos tags.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   * @param additionalContext The context to provide additional information with.
+   *                          
+   * @return An array of pos tags for each token provided in {@code sentence}.
+   */
   String[] tag(String[] sentence, Object[] additionalContext);
 
+  /**
+   * Assigns the sentence the top-k {@link Sequence sequences}.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   *
+   * @return An array of {@link Sequence sequeneces} for each token provided in {@code sentence}.
+   */
   Sequence[] topKSequences(String[] sentence);
 
+  /**
+   * Assigns the sentence the top-k {@link Sequence sequences}.
+   *
+   * @param sentence The sentence of tokens to be tagged.
+   * @param additionalContext The context to provide additional information with.
+   *
+   * @return An array of {@link Sequence sequeneces} for each token provided in {@code sentence}.

Review Comment:
   s/sequeneces/sequences



##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -95,8 +83,19 @@ public POSTaggerFactory(byte[] featureGeneratorBytes, final Map<String, Object>
     this.posDictionary = posDictionary;
   }
 
+
+  // reduced visibility to ensure deprecation is respected in future versions
+  @Deprecated
+  POSTaggerFactory(Dictionary ngramDictionary, TagDictionary posDictionary) {
+    this.init(ngramDictionary, posDictionary);
+
+    // TODO: This could be made functional by creating some default feature generation
+    // which uses the dictionary ...

Review Comment:
   :ok_man: 



##########
opennlp-tools/src/main/java/opennlp/tools/postag/TagDictionary.java:
##########
@@ -25,11 +25,18 @@
 public interface TagDictionary {
 
   /**
-   * Returns a list of valid tags for the specified word.
+   * Retrieves a list of valid tags for the specified {@code word}.
    *
    * @param word The word.
-   * @return A list of valid tags for the specified word or null if no information
-   * is available for that word.
+   * @return An array of valid tags for the specified {@code word} or {@code null} if
+   *         no information is available for that word.
    */
   String[] getTags(String word);
+
+  /**
+   * Whether if the dictionary is case-sensitive or not.

Review Comment:
   I think we can drop the `if` (i.e. Whether the dictionary is... or not).



##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -225,49 +240,51 @@ public void setTagDictionary(TagDictionary dictionary) {
     this.posDictionary = dictionary;
   }
 
+  /**
+   * @return The key-value based resources map, or an empty map.
+   */
   protected Map<String, Object> getResources() {
-
-
     if (resources != null) {
       return resources;
     }
 
     return Collections.emptyMap();
   }
 
+  /**
+   * @return The feature generator bytes used.
+   */
   protected byte[] getFeatureGenerator() {
     return featureGeneratorBytes;
   }
 
+  /**
+   * @return The {@link TagDictionary} used.
+   */
   public TagDictionary getTagDictionary() {
     if (this.posDictionary == null && artifactProvider != null)
       this.posDictionary = artifactProvider.getArtifact(TAG_DICTIONARY_ENTRY_NAME);
     return this.posDictionary;
   }
 
-  /**
-   * @deprecated this will be reduced in visibility and later removed
-   */
-  @Deprecated
-  public Dictionary getDictionary() {
+  @Deprecated // will be removed when only 8 series models are supported
+  private Dictionary getDictionary() {

Review Comment:
   Ditto the note below, about changing public methods :point_down: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp] mawiesne commented on a diff in pull request #446: OPENNLP-1404 Enhance JavaDoc in opennlp.tools.postag package

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038989413


##########
opennlp-tools/src/main/java/opennlp/tools/postag/TagDictionary.java:
##########
@@ -25,11 +25,18 @@
 public interface TagDictionary {
 
   /**
-   * Returns a list of valid tags for the specified word.
+   * Retrieves a list of valid tags for the specified {@code word}.
    *
    * @param word The word.
-   * @return A list of valid tags for the specified word or null if no information
-   * is available for that word.
+   * @return An array of valid tags for the specified {@code word} or {@code null} if
+   *         no information is available for that word.
    */
   String[] getTags(String word);
+
+  /**
+   * Whether if the dictionary is case-sensitive or not.

Review Comment:
   Did not spot that one, or misread the sentence in that moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org