You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by GitBox <gi...@apache.org> on 2022/12/17 21:17:28 UTC

[GitHub] [opennlp] mawiesne opened a new pull request, #461: OPENNLP-1416 Enhance JavaDoc in opennlp.tools.formats.ad package

mawiesne opened a new pull request, #461:
URL: https://github.com/apache/opennlp/pull/461

   Change
   -
   - adds missing JavaDoc
   - improves existing documentation for clarity
   - removes superfluous text
   - adds 'final' modifier where useful and applicable
   - adds 'Override' annotation where useful and applicable
   - simplifies several constructors, removing duplicate code
   - fixes non-JNC compliant naming of constants
   - fixes several typos
   
   Tasks
   -
   Thank you for contributing to Apache OpenNLP.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
   - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
   
   ### For documentation related changes:
   - [x] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [opennlp] jzonthemtn merged pull request #461: OPENNLP-1416 Enhance JavaDoc in opennlp.tools.formats.ad package

Posted by GitBox <gi...@apache.org>.

jzonthemtn merged PR #461:
URL: https://github.com/apache/opennlp/pull/461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [opennlp] kinow commented on a diff in pull request #461: OPENNLP-1416 Enhance JavaDoc in opennlp.tools.formats.ad package

Posted by GitBox <gi...@apache.org>.

kinow commented on code in PR #461:
URL: https://github.com/apache/opennlp/pull/461#discussion_r1052672784


##########
opennlp-tools/src/main/java/opennlp/tools/formats/ad/ADChunkSampleStreamFactory.java:
##########
@@ -36,7 +36,8 @@
  * A Factory to create a Arvores Deitadas ChunkStream from the command line
  * utility.
  * <p>
- * <b>Note:</b> Do not use this class, internal use only!
+ * <b>Note:</b>
+ * Do not use this class, internal use only!

Review Comment:
   :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [opennlp] rzo1 commented on a diff in pull request #461: OPENNLP-1416 Enhance JavaDoc in opennlp.tools.formats.ad package

Posted by GitBox <gi...@apache.org>.

rzo1 commented on code in PR #461:
URL: https://github.com/apache/opennlp/pull/461#discussion_r1052995489


##########
opennlp-tools/src/main/java/opennlp/tools/formats/ad/ADNameSampleStream.java:
##########
@@ -154,61 +153,49 @@ public class ADNameSampleStream implements ObjectStream<NameSample> {
 
   private final ObjectStream<ADSentenceStream.Sentence> adSentenceStream;
 
-  /**
+  /*
    * To keep the last left contraction part
    */
   private String leftContractionPart = null;
 
   private final boolean splitHyphenatedTokens;
 
   /**
-   * Creates a new {@link NameSample} stream from a line stream, i.e.
-   * {@link ObjectStream}&lt;{@link String}&gt;, that could be a
-   * {@link PlainTextByLineStream} object.
+   * Initializes a new {@link ADNameSampleStream} stream from a {@link ObjectStream<String>},
+   * that could be a {@link PlainTextByLineStream} object.
    *
-   * @param lineStream
-   *          a stream of lines as {@link String}
-   * @param splitHyphenatedTokens
-   *          if true hyphenated tokens will be separated: "carros-monstro" &gt;
-   *          "carros" "-" "monstro"
+   * @param lineStream An {@link ObjectStream<String>} as input.
+   * @param splitHyphenatedTokens If {@code true} hyphenated tokens will be separated:
+   *                              "carros-monstro" &gt; "carros" "-" "monstro".
    */
   public ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens) {
     this.adSentenceStream = new ADSentenceStream(lineStream);
     this.splitHyphenatedTokens = splitHyphenatedTokens;
   }
 
   /**
-   * Creates a new {@link NameSample} stream from a {@link InputStream}
+   * Initializes a new {@link ADNameSampleStream} from an {@link InputStreamFactory}
    *
-   * @param in
-   *          the Corpus {@link InputStream}
-   * @param charsetName
-   *          the charset of the Arvores Deitadas Corpus
-   * @param splitHyphenatedTokens
-   *          if true hyphenated tokens will be separated: "carros-monstro" &gt;
-   *          "carros" "-" "monstro"
+   * @param in The Corpus {@link InputStreamFactory}.
+   * @param charsetName  The {@link java.nio.charset.Charset charset} to use
+   *                     for reading of the corpus.
+   * @param splitHyphenatedTokens If {@code true} hyphenated tokens will be separated:
+   *                              "carros-monstro" &gt; "carros" "-" "monstro".
    */
   @Deprecated
   public ADNameSampleStream(InputStreamFactory in, String charsetName,
       boolean splitHyphenatedTokens) throws IOException {
-
-    try {
-      this.adSentenceStream = new ADSentenceStream(new PlainTextByLineStream(
-          in, charsetName));
-      this.splitHyphenatedTokens = splitHyphenatedTokens;
-    } catch (UnsupportedEncodingException e) {
-      // UTF-8 is available on all JVMs, will never happen
-      throw new IllegalStateException(e);
-    }
+    this(new PlainTextByLineStream(in, charsetName), splitHyphenatedTokens);
   }
 
   private int textID = -1;
 
+  @Override
   public NameSample read() throws IOException {
 
     Sentence paragraph;
     // we should look for text here.
-    while ((paragraph = this.adSentenceStream.read()) != null) {
+    if ((paragraph = this.adSentenceStream.read()) != null) {

Review Comment:
   That looks like the `while` was a bug previously (emptying the whole underlying stream...) - good catch!



##########
opennlp-tools/src/main/java/opennlp/tools/formats/ad/ADPOSSampleStream.java:
##########
@@ -64,35 +59,26 @@ public ADPOSSampleStream(ObjectStream<String> lineStream, boolean expandME,
   }
 
   /**
-   * Creates a new {@link POSSample} stream from a {@link InputStream}
+   * Creates a new {@link POSSample} stream from an {@link InputStreamFactory}
    *
-   * @param in
-   *          the Corpus {@link InputStream}
-   * @param charsetName
-   *          the charset of the Arvores Deitadas Corpus
-   * @param expandME
-   *          if true will expand the multiword expressions, each word of the
+   * @param in The {@link InputStreamFactory} for the corpus.
+   * @param charsetName  The {@link java.nio.charset.Charset charset} to use
+   *                     for reading of the corpus.
+   * @param expandME If {@code true} will expand the multiword expressions, each word of the
    *          expression will have the POS Tag that was attributed to the
-   *          expression plus the prefix B- or I- (CONLL convention)
-   * @param includeFeatures
-   *          if true will combine the POS Tag with the feature tags
+   *          expression plus the prefix {@code B-} or {@code I-} (CONLL convention).
+   * @param includeFeatures If {@code true} will combine the POS Tag with the feature tags.
    */
   public ADPOSSampleStream(InputStreamFactory in, String charsetName,
       boolean expandME, boolean includeFeatures) throws IOException {
 
-    try {
-      this.adSentenceStream = new ADSentenceStream(new PlainTextByLineStream(in, charsetName));
-      this.expandME = expandME;
-      this.isIncludeFeatures = includeFeatures;
-    } catch (UnsupportedEncodingException e) {
-      // UTF-8 is available on all JVMs, will never happen
-      throw new IllegalStateException(e);
-    }
+    this(new PlainTextByLineStream(in, charsetName), expandME, includeFeatures);
   }
 
+  @Override
   public POSSample read() throws IOException {
     Sentence paragraph;
-    while ((paragraph = this.adSentenceStream.read()) != null) {
+    if ((paragraph = this.adSentenceStream.read()) != null) {

Review Comment:
   :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [opennlp] mawiesne commented on a diff in pull request #461: OPENNLP-1416 Enhance JavaDoc in opennlp.tools.formats.ad package

Posted by GitBox <gi...@apache.org>.

mawiesne commented on code in PR #461:
URL: https://github.com/apache/opennlp/pull/461#discussion_r1053282816


##########
opennlp-tools/src/main/java/opennlp/tools/formats/ad/ADNameSampleStream.java:
##########
@@ -154,61 +153,49 @@ public class ADNameSampleStream implements ObjectStream<NameSample> {
 
   private final ObjectStream<ADSentenceStream.Sentence> adSentenceStream;
 
-  /**
+  /*
    * To keep the last left contraction part
    */
   private String leftContractionPart = null;
 
   private final boolean splitHyphenatedTokens;
 
   /**
-   * Creates a new {@link NameSample} stream from a line stream, i.e.
-   * {@link ObjectStream}&lt;{@link String}&gt;, that could be a
-   * {@link PlainTextByLineStream} object.
+   * Initializes a new {@link ADNameSampleStream} stream from a {@link ObjectStream<String>},
+   * that could be a {@link PlainTextByLineStream} object.
    *
-   * @param lineStream
-   *          a stream of lines as {@link String}
-   * @param splitHyphenatedTokens
-   *          if true hyphenated tokens will be separated: "carros-monstro" &gt;
-   *          "carros" "-" "monstro"
+   * @param lineStream An {@link ObjectStream<String>} as input.
+   * @param splitHyphenatedTokens If {@code true} hyphenated tokens will be separated:
+   *                              "carros-monstro" &gt; "carros" "-" "monstro".
    */
   public ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens) {
     this.adSentenceStream = new ADSentenceStream(lineStream);
     this.splitHyphenatedTokens = splitHyphenatedTokens;
   }
 
   /**
-   * Creates a new {@link NameSample} stream from a {@link InputStream}
+   * Initializes a new {@link ADNameSampleStream} from an {@link InputStreamFactory}
    *
-   * @param in
-   *          the Corpus {@link InputStream}
-   * @param charsetName
-   *          the charset of the Arvores Deitadas Corpus
-   * @param splitHyphenatedTokens
-   *          if true hyphenated tokens will be separated: "carros-monstro" &gt;
-   *          "carros" "-" "monstro"
+   * @param in The Corpus {@link InputStreamFactory}.
+   * @param charsetName  The {@link java.nio.charset.Charset charset} to use
+   *                     for reading of the corpus.
+   * @param splitHyphenatedTokens If {@code true} hyphenated tokens will be separated:
+   *                              "carros-monstro" &gt; "carros" "-" "monstro".
    */
   @Deprecated
   public ADNameSampleStream(InputStreamFactory in, String charsetName,
       boolean splitHyphenatedTokens) throws IOException {
-
-    try {
-      this.adSentenceStream = new ADSentenceStream(new PlainTextByLineStream(
-          in, charsetName));
-      this.splitHyphenatedTokens = splitHyphenatedTokens;
-    } catch (UnsupportedEncodingException e) {
-      // UTF-8 is available on all JVMs, will never happen
-      throw new IllegalStateException(e);
-    }
+    this(new PlainTextByLineStream(in, charsetName), splitHyphenatedTokens);
   }
 
   private int textID = -1;
 
+  @Override
   public NameSample read() throws IOException {
 
     Sentence paragraph;
     // we should look for text here.
-    while ((paragraph = this.adSentenceStream.read()) != null) {
+    if ((paragraph = this.adSentenceStream.read()) != null) {

Review Comment:
   The first element read is being directly returned. Therefore `while` made no sense here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org