You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/14 14:51:55 UTC

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

donnerpeter opened a new pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202


   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene or Solr:
   
   * https://issues.apache.org/jira/projects/LUCENE
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   * SOLR-####: <short description of problem or changes>
   
   LUCENE and SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   The code produces too many warnings, so it's hard to notice newly introduced ones when developing new functionality.
   
   # Solution
   
   Apply IDE quick fixes, sometimes cleanup manually.
   
   # Tests
   
   No semantic changes, just cleanup.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558227714



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       I'd rather keep it in if it's already there. Seems like it's just more complete this way (you point at a range of characters at the source buffer rather than require zero-based indexes). I'd even add a test that verifies it works by randomizing the offset somewhere... 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss merged pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss merged pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558242103



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       Passing a constant number gives you one test (which is fine). Passing a randomized number (0-X) gives you many tests over time. This seems stupid only until you find a bug deep down somewhere, hiding under a complex scenario of inputs. Maybe the offset is used to derive a different index, or an overflow occurs somewhere... I don't know but I've seen this too many times to call it redundant. ;)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558280430



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -572,9 +566,7 @@ private boolean checkCondition(
     affixReader.setPosition(8 * affix);
     char flag = (char) (affixReader.readShort() & 0xffff);
     affixReader.skipBytes(2); // strip

Review comment:
       Ugh. Sorry. The code contains nearly similar fragments and I was looking at the wrong section. Maybe it'd be a good idea to keep that condition variable and assert it doesn't have any non-zero bits after the one we use? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558225290



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -654,7 +646,7 @@ private boolean checkCondition(
                   true,
                   circumfix,
                   caseVariant));
-        } else if (dictionary.complexPrefixes == false && dictionary.twoStageAffix) {
+        } else if (!dictionary.complexPrefixes && dictionary.twoStageAffix) {

Review comment:
       I prefer shorter versions (especially given the short column limit of spotlessJava), and IntelliJ highlights this by default, but if you insist, I can disable the highlighting and live with it. What's the rule exactly, write `x == false` instead of `!x` everywhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558226224



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -654,7 +646,7 @@ private boolean checkCondition(
                   true,
                   circumfix,
                   caseVariant));
-        } else if (dictionary.complexPrefixes == false && dictionary.twoStageAffix) {
+        } else if (!dictionary.complexPrefixes && dictionary.twoStageAffix) {

Review comment:
       So do I. But @romseygeek has a different preference. I think you can follow your own gut feeling here - there are no explicit rules that we enforce.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558239126



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -654,7 +646,7 @@ private boolean checkCondition(
                   true,
                   circumfix,
                   caseVariant));
-        } else if (dictionary.complexPrefixes == false && dictionary.twoStageAffix) {
+        } else if (!dictionary.complexPrefixes && dictionary.twoStageAffix) {

Review comment:
       Then I'd shorten this :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558223221



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       No one uses it now, and it's easy to add the offset back if needed. So far it doesn't seem very likely to happen.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558255660



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       No pressure. :) I typically add known border cases and a randomized test on top of that. Fuzzifiers are great, really. Take a look at this bug which slipped through the cracks for so many years:
   https://mail.openjdk.java.net/pipermail/javadoc-dev/2020-December/002255.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558224062



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -16,14 +16,7 @@
  */
 package org.apache.lucene.analysis.hunspell;
 
-import java.io.BufferedInputStream;
-import java.io.BufferedOutputStream;
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
-import java.io.LineNumberReader;
-import java.io.OutputStream;
+import java.io.*;

Review comment:
       Darn. I see you're right. They should be expanded - enforcement of this has been lost somewhere during ant-gradle migration. gjf doesn't deal with this [1].
   
   You can also leave it in if you wish - automatic import expansion/ ordering can be done later (please feel free to file a separate issue!).
   
   https://github.com/google/google-java-format/issues/113




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558284980



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -572,9 +566,7 @@ private boolean checkCondition(
     affixReader.setPosition(8 * affix);
     char flag = (char) (affixReader.readShort() & 0xffff);
     affixReader.skipBytes(2); // strip

Review comment:
       That assertion would be incorrect, as those bits are also used sometimes :)
   In fact, I have another [refactoring](https://github.com/donnerpeter/lucene-solr/commit/9122208d04e0f5811c6d292ce40efa8423fc5aad) to be applied after this patch. It deduplicates the code around this place, and there `condition` and `crossProduct` are moved closer to where they're needed and become independent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558274614



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -572,9 +566,7 @@ private boolean checkCondition(
     affixReader.setPosition(8 * affix);
     char flag = (char) (affixReader.readShort() & 0xffff);
     affixReader.skipBytes(2); // strip

Review comment:
       Any follow-up to Rob's concerns? This condition variable seems to be asserted on later (passed as an argument to checkCondition) and this patch removes the assignment altogether. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558198923



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       So this offset thing was never used? I this not useful to have keep it though? Utility methods could still exist (they'd use offset = 0) but it seems like some flexibility with offset here is fine.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -16,14 +16,7 @@
  */
 package org.apache.lucene.analysis.hunspell;
 
-import java.io.BufferedInputStream;
-import java.io.BufferedOutputStream;
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
-import java.io.LineNumberReader;
-import java.io.OutputStream;
+import java.io.*;

Review comment:
       Did you run gradlew tidy? Wildcard imports shouldn't be there, hence the question.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -654,7 +646,7 @@ private boolean checkCondition(
                   true,
                   circumfix,
                   caseVariant));
-        } else if (dictionary.complexPrefixes == false && dictionary.twoStageAffix) {
+        } else if (!dictionary.complexPrefixes && dictionary.twoStageAffix) {

Review comment:
       I don't care much but I know some other committers prefer the verbose version (== false).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#issuecomment-760924888


   Thanks Peter.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558246319



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       OK, having created a property-based test framework myself, I'm in a wrong position to argue about randomization usefulness. I'll try this, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558220984



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -209,6 +178,7 @@ public Dictionary(
     this.needsOutputCleaning = false; // set if we have an OCONV
     flagLookup.add(new BytesRef()); // no flags -> ord 0
 
+    Path tempPath = getDefaultTempDir(); // TODO: make this configurable?

Review comment:
       BTW any idea why the default temp dir is used here for reading from aff file twice, but another temp dir is passed to the constructor and used for similarly-temp-looking task (external dictionary entry sorting)? Can these tasks use the same dir? And if yes, which one? Clients would be simpler if they hadn't to care about temp dirs, but maybe there are use cases for passing the temp dir explicitly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558238780



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -251,34 +221,33 @@ public Dictionary(
   }
 
   /** Looks up Hunspell word forms from the dictionary */
-  IntsRef lookupWord(char word[], int offset, int length) {
-    return lookup(words, word, offset, length);
+  IntsRef lookupWord(char[] word, int length) {
+    return lookup(words, word, length);
   }
 
   // only for testing
-  IntsRef lookupPrefix(char word[], int offset, int length) {
-    return lookup(prefixes, word, offset, length);
+  IntsRef lookupPrefix(char[] word) {
+    return lookup(prefixes, word, word.length);
   }
 
   // only for testing
-  IntsRef lookupSuffix(char word[], int offset, int length) {
-    return lookup(suffixes, word, offset, length);
+  IntsRef lookupSuffix(char[] word) {
+    return lookup(suffixes, word, word.length);
   }
 
-  IntsRef lookup(FST<IntsRef> fst, char word[], int offset, int length) {
+  private IntsRef lookup(FST<IntsRef> fst, char[] word, int length) {
     if (fst == null) {
       return null;
     }
     final FST.BytesReader bytesReader = fst.getBytesReader();
-    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<IntsRef>());
+    final FST.Arc<IntsRef> arc = fst.getFirstArc(new FST.Arc<>());
     // Accumulate output as we go
     final IntsRef NO_OUTPUT = fst.outputs.getNoOutput();
     IntsRef output = NO_OUTPUT;
 
-    int l = offset + length;
     try {
-      for (int i = offset, cp = 0; i < l; i += Character.charCount(cp)) {

Review comment:
       Having it here complicates the code and its refactoring a bit. That said, adding a test would remove IntelliJ's warning, so it's also a way to go. But why randomized? Passing `2` or something looks simple enough, and it doesn't seem that randomization would catch new bugs in this context.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558286324



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -572,9 +566,7 @@ private boolean checkCondition(
     affixReader.setPosition(8 * affix);
     char flag = (char) (affixReader.readShort() & 0xffff);
     affixReader.skipBytes(2); // strip

Review comment:
       Ah. Clear - thanks! I unfortunately look at it from the side. Please disregard then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558218792



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -16,14 +16,7 @@
  */
 package org.apache.lucene.analysis.hunspell;
 
-import java.io.BufferedInputStream;
-import java.io.BufferedOutputStream;
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.InputStreamReader;
-import java.io.LineNumberReader;
-import java.io.OutputStream;
+import java.io.*;

Review comment:
       The imports were collapsed by the IDE, but `gradle precheck` didn't complain of wildcards. I can expand them back if that matters.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558277303



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -572,9 +566,7 @@ private boolean checkCondition(
     affixReader.setPosition(8 * affix);
     char flag = (char) (affixReader.readShort() & 0xffff);
     affixReader.skipBytes(2); // strip

Review comment:
       As promised in JIRA, I've inlined the `condition` variable to avoid potential confusion




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2202: LUCENE-9664: Hunspell support: fix most IntelliJ warnings, cleanup

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2202:
URL: https://github.com/apache/lucene-solr/pull/2202#discussion_r558225082



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -209,6 +178,7 @@ public Dictionary(
     this.needsOutputCleaning = false; // set if we have an OCONV
     flagLookup.add(new BytesRef()); // no flags -> ord 0
 
+    Path tempPath = getDefaultTempDir(); // TODO: make this configurable?

Review comment:
       I didn't write this code, so I don't know for sure. I believe the externally-provided temp folder should be used (and reused) but it's of type Directory (it's a Lucene abstraction) so I'm not sure if you'll be able to reuse it for generic Paths.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org