You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by GitBox <gi...@apache.org> on 2023/01/19 14:36:56 UTC

[GitHub] [opennlp-sandbox] mawiesne opened a new pull request, #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

mawiesne opened a new pull request, #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59

   - adjusts opennlp-tools to 2.1.0
   - adjusts parent project (org.apache.apache) to version 18
   - adjusts Java language level to 11
   - adds missing test resources to check whether the existing work; some do, some don't...
   - ignored tests that aren't functional even with corresponding test resources


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397348084

   > How large would they be?
   
   - senseval3: ~6.9 MB (uncompressed)
   - opennlp models: ~8.9 MB (compressed/binary)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] rzo1 commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
rzo1 commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082237692


##########
opennlp-wsd/pom.xml:
##########
@@ -25,20 +25,20 @@
 	<parent>
 		<groupId>org.apache</groupId>
 		<artifactId>apache</artifactId>
-		<version>13</version>
+		<version>18</version>

Review Comment:
   https://issues.apache.org/jira/browse/OPENNLP-1452



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] rzo1 commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
rzo1 commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082234738


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -68,11 +72,11 @@ public class WSDHelper {
       "RBR", "RBS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ" };
 
   // List of Negation Words
-  public static ArrayList<String> negationWords = new ArrayList<String>(
+  public static ArrayList<String> negationWords = new ArrayList<>(

Review Comment:
   List? (as HashMap was also changed to Map in this class a few lines above)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] rzo1 commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
rzo1 commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082216933


##########
opennlp-wsd/pom.xml:
##########
@@ -25,20 +25,20 @@
 	<parent>
 		<groupId>org.apache</groupId>
 		<artifactId>apache</artifactId>
-		<version>13</version>
+		<version>18</version>

Review Comment:
   Can we go for `29` ? We can update it in `opennlp` as well (mostly build support, etc.)



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -480,14 +482,10 @@ public static HashMap<String, Object> getEnglishWords(String dict) {
    */
   public static POS getPOS(String posTag) {
 
-    ArrayList<String> adjective = new ArrayList<String>(Arrays.asList("JJ",
-        "JJR", "JJS"));
-    ArrayList<String> adverb = new ArrayList<String>(Arrays.asList("RB", "RBR",
-        "RBS"));
-    ArrayList<String> noun = new ArrayList<String>(Arrays.asList("NN", "NNS",
-        "NNP", "NNPS"));
-    ArrayList<String> verb = new ArrayList<String>(Arrays.asList("VB", "VBD",
-        "VBG", "VBN", "VBP", "VBZ"));
+    List<String> adjective = new ArrayList<>(Arrays.asList("JJ", "JJR", "JJS"));

Review Comment:
   The `new ArrayList()` calls are not needed. 



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -631,7 +629,7 @@ public static ArrayList<WordPOS> getAllRelevantWords(String[] sentence) {
   public static ArrayList<String> StemWordWithWordNet(WordPOS wordToStem) {
     if (wordToStem == null)
       return null;
-    ArrayList<String> stems = new ArrayList<String>();
+    ArrayList<String> stems = new ArrayList<>();

Review Comment:
   List?



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -606,7 +604,7 @@ public static boolean areStringArraysEqual(String[] array1, String[] array2) {
 
   public static ArrayList<WordPOS> getAllRelevantWords(String[] sentence) {
 
-    ArrayList<WordPOS> relevantWords = new ArrayList<WordPOS>();
+    ArrayList<WordPOS> relevantWords = new ArrayList<>();

Review Comment:
   List?



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/datareader/SensevalReader.java:
##########
@@ -218,9 +240,9 @@ public ArrayList<WSDSample> getSensevalData(String wordTag) {
                       String textAfter = nChild.getChildNodes().item(2)
                           .getTextContent();
 
-                      ArrayList<String> textBeforeTokenzed = new ArrayList<String>(
+                      ArrayList<String> textBeforeTokenzed = new ArrayList<>(

Review Comment:
   No need for `new` Operator, if we use `List` interface



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -68,11 +72,11 @@ public class WSDHelper {
       "RBR", "RBS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ" };
 
   // List of Negation Words
-  public static ArrayList<String> negationWords = new ArrayList<String>(
+  public static ArrayList<String> negationWords = new ArrayList<>(

Review Comment:
   There is a switch from `HashMap` to `Map` -> also switch from `ArrayList` to `List` ?



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/datareader/SemcorReaderExtended.java:
##########
@@ -200,7 +202,7 @@ private ArrayList<Sentence> readFile(String file) {
    */
   private ArrayList<WSDSample> getSemcorOneFileData(String file, String wordTag) {
 
-    ArrayList<WSDSample> setInstances = new ArrayList<WSDSample>();
+    ArrayList<WSDSample> setInstances = new ArrayList<>();

Review Comment:
   List?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397202927

   > create tests using the large files in a separate suite
   
   Could be covered via a MVN profile in a separate PR.
   
   For now, it would be okay to add those larger (binary) files to an existing sandbox project, right? Making those structures prettier would be a next step, oc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082234405


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -606,7 +604,7 @@ public static boolean areStringArraysEqual(String[] array1, String[] array2) {
 
   public static ArrayList<WordPOS> getAllRelevantWords(String[] sentence) {
 
-    ArrayList<WordPOS> relevantWords = new ArrayList<WordPOS>();
+    ArrayList<WordPOS> relevantWords = new ArrayList<>();

Review Comment:
   Will check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1398109754

   > Added some (non blocking) comments.
   
   @rzo1 Comments resolved where applicable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397172400

   @kinow I'd like to point your interest to the test resources directory and the files I added herein. What's your opinion on the bunch of data. IMHO, it is required to have some sort of tests running...; on the other hand it's quite a mass. Any ideas how we can achieve a tradeoff here, that is, have it more lightweight? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] kinow commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
kinow commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397233005

   >For now, it would be okay to add those larger (binary) files to an existing sandbox project, right? Making those structures prettier would be a next step, oc.
   
   How large would they be? I am not syncing the project every day, so it won't affect me. As long as it's in 10's of MB's that should be fine, I think. In Apache Commons Imaging I think we included some 4, 5MB images needed for testing at one point, but we are trying to keep the test data small (and the tests there don't need the same amount as for a model in opennlp).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082255558


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -631,7 +629,7 @@ public static ArrayList<WordPOS> getAllRelevantWords(String[] sentence) {
   public static ArrayList<String> StemWordWithWordNet(WordPOS wordToStem) {
     if (wordToStem == null)
       return null;
-    ArrayList<String> stems = new ArrayList<String>();
+    ArrayList<String> stems = new ArrayList<>();

Review Comment:
   see other comment, not this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082233356


##########
opennlp-wsd/pom.xml:
##########
@@ -25,20 +25,20 @@
 	<parent>
 		<groupId>org.apache</groupId>
 		<artifactId>apache</artifactId>
-		<version>13</version>
+		<version>18</version>

Review Comment:
   Good idea, I wanted to keep it in sync with opennlp core project. Will take note for next PRs, once the core is referencing "29". Feel free to open up an issue there, so make this happen soon. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082255758


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/datareader/SemcorReaderExtended.java:
##########
@@ -200,7 +202,7 @@ private ArrayList<Sentence> readFile(String file) {
    */
   private ArrayList<WSDSample> getSemcorOneFileData(String file, String wordTag) {
 
-    ArrayList<WSDSample> setInstances = new ArrayList<WSDSample>();
+    ArrayList<WSDSample> setInstances = new ArrayList<>();

Review Comment:
   see other comment, not this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397347783

   > How large would they be?
   
   - senseval3: ~6.9 MB (uncompressed)
   - opennlp models: ~8.9 MB (compressed/binary)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] kinow commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
kinow commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397367283

   > > How large would they be?
   > 
   >     * senseval3: ~6.9 MB (uncompressed)
   > 
   >     * opennlp models: ~8.9 MB (compressed/binary)
   
   I think these are fine for now, especially given we can try to reduce the size or use another approach in a follow-up issue :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397495350

   > * senseval3: ~6.9 MB (uncompressed)
   > 
   > * opennlp models: ~8.9 MB (compressed/binary)
   
   @kinow I gziped some plain resource and implemented code to handle reading in compressed forms. This way, we are down to: 
   
   * senseval3: ~2.1 MB (compressed)
   * opennlp models: ~3.2 MB (compressed/binary)
   
   *cheers 🍺 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne merged pull request #59: Update sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by "mawiesne (via GitHub)" <gi...@apache.org>.
mawiesne merged PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] rzo1 commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
rzo1 commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082234738


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -68,11 +72,11 @@ public class WSDHelper {
       "RBR", "RBS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ" };
 
   // List of Negation Words
-  public static ArrayList<String> negationWords = new ArrayList<String>(
+  public static ArrayList<String> negationWords = new ArrayList<>(

Review Comment:
   List?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] kinow commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
kinow commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1081427293


##########
opennlp-wsd/src/test/java/opennlp/tools/disambiguator/LeskEvaluatorTest.java:
##########
@@ -23,23 +23,24 @@
 
 import opennlp.tools.disambiguator.datareader.SensevalReader;
 
+import org.junit.Ignore;
 import org.junit.Test;
 
 public class LeskEvaluatorTest {
 
   static SensevalReader seReader = new SensevalReader();
 
   @Test
-  public static void main(String[] args) {
+  @Ignore // TODO Investigate why test fails while parsing 'EnglishLS.train'

Review Comment:
   :+1: maybe create an issue for later if we are not planning to work on this soon.



##########
opennlp-wsd/src/main/java/opennlp/tools/cmdline/disambiguator/DisambiguatorTool.java:
##########
@@ -117,11 +120,12 @@ static ObjectStream<WSDSample> openSampleData(String sampleDataName,
       File sampleDataFile, Charset encoding) {
     CmdLineUtil.checkInputFile(sampleDataName + " Data", sampleDataFile);
 
-    FileInputStream sampleDataIn = CmdLineUtil.openInFile(sampleDataFile);
-
-    ObjectStream<String> lineStream = new PlainTextByLineStream(
-        sampleDataIn.getChannel(), encoding);
-
-    return new WSDSampleStream(lineStream);
+    try {
+      MarkableFileInputStreamFactory factory = new MarkableFileInputStreamFactory(sampleDataFile);
+      ObjectStream<String> lineStream = new ParagraphStream(new PlainTextByLineStream(factory, encoding));

Review Comment:
   Sorry, not on my laptop with an IDE to confirm, but if these two are Closeable's perhaps they could be moved to the `try-with-resources`, if that makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1081516840


##########
opennlp-wsd/src/main/java/opennlp/tools/cmdline/disambiguator/DisambiguatorTool.java:
##########
@@ -117,11 +120,12 @@ static ObjectStream<WSDSample> openSampleData(String sampleDataName,
       File sampleDataFile, Charset encoding) {
     CmdLineUtil.checkInputFile(sampleDataName + " Data", sampleDataFile);
 
-    FileInputStream sampleDataIn = CmdLineUtil.openInFile(sampleDataFile);
-
-    ObjectStream<String> lineStream = new PlainTextByLineStream(
-        sampleDataIn.getChannel(), encoding);
-
-    return new WSDSampleStream(lineStream);
+    try {
+      MarkableFileInputStreamFactory factory = new MarkableFileInputStreamFactory(sampleDataFile);
+      ObjectStream<String> lineStream = new ParagraphStream(new PlainTextByLineStream(factory, encoding));

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1081516507


##########
opennlp-wsd/src/test/java/opennlp/tools/disambiguator/LeskEvaluatorTest.java:
##########
@@ -23,23 +23,24 @@
 
 import opennlp.tools.disambiguator.datareader.SensevalReader;
 
+import org.junit.Ignore;
 import org.junit.Test;
 
 public class LeskEvaluatorTest {
 
   static SensevalReader seReader = new SensevalReader();
 
   @Test
-  public static void main(String[] args) {
+  @Ignore // TODO Investigate why test fails while parsing 'EnglishLS.train'

Review Comment:
   https://issues.apache.org/jira/browse/OPENNLP-1446



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] kinow commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
kinow commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397192288

   > @kinow I'd like to point your interest to the test resources directory and the files I added herein. What's your opinion on the bunch of data. IMHO, it is required to have some sort of tests running...; on the other hand it's quite a mass. Any ideas how we can achieve a tradeoff here, that is, have it more lightweight?
   
   Thanks for pointing that out, @mawiesne . I am using the GH UI, and skipped the test files without noticing their sizes.
   
   Does compressing these files help us here? If so we could include them as gzip/some other compression algo, and decompress when running the tests?
   
   Otherwise, what I used in another project that needed some large files for tests was the following setup:
   
   - enabled LFS in git
   - create tests using the large files in a separate suite
   - disable the suite by default, with a toggle in the build to enable that (i.e. you assume the user/dev will have enabled LFS and pulled the large files)
   
   The final alternative I can think of is storing the files in some ASF host somewhere, if possible, and do something similar to the setup with Git LFS, but asking users to download these files to run the tests in certain suites.
   
   Not sure if really helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082234165


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -68,11 +72,11 @@ public class WSDHelper {
       "RBR", "RBS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ" };
 
   // List of Negation Words
-  public static ArrayList<String> negationWords = new ArrayList<String>(
+  public static ArrayList<String> negationWords = new ArrayList<>(

Review Comment:
   Unclear to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] mawiesne commented on a diff in pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
mawiesne commented on code in PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#discussion_r1082255136


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/WSDHelper.java:
##########
@@ -606,7 +604,7 @@ public static boolean areStringArraysEqual(String[] array1, String[] array2) {
 
   public static ArrayList<WordPOS> getAllRelevantWords(String[] sentence) {
 
-    ArrayList<WordPOS> relevantWords = new ArrayList<WordPOS>();
+    ArrayList<WordPOS> relevantWords = new ArrayList<>();

Review Comment:
   Method signature returns ArrayList and is used quite frequently under this assumption. Won't change (in this PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [opennlp-sandbox] kinow commented on pull request #59: Updates sandbox component 'opennlp-wsd' to be compatible with latest opennlp-tools release

Posted by GitBox <gi...@apache.org>.
kinow commented on PR #59:
URL: https://github.com/apache/opennlp-sandbox/pull/59#issuecomment-1397504756

   >I gziped some plain resource and implemented code to handle reading in compressed forms. This way, we are down to:
   
   Excellent! I think these sizes are good for now, so no need to worry about that for a while :+1: Thanks @mawiesne !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org