You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/22 12:19:46 UTC

[GitHub] [lucene-solr] donnerpeter opened a new pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

donnerpeter opened a new pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238


   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene or Solr:
   
   * https://issues.apache.org/jira/projects/LUCENE
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   * SOLR-####: <short description of problem or changes>
   
   LUCENE and SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   Hunspell: check that all flags are > 0 and fit char range, as Hunspell does
   
   # Solution
   
   1. Add a corresponding assertion
   2. Fix HIDDEN_FLAG serialization bug found by this assertion
   3. Use char instead of int to store and pass flags around, because now we can use 0 for an unset flag instead of -1
   
   # Tests
   
   No new tests, but some tests failed after implementing step 1 and were fixed in step 2
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563528897



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       It'd be great to be consistent with exceptions when parsing input - sometimes it's ParsingException, here it's IAE. 

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Is this intentional? Because it changes the logic of concatenation (always leaving the trailing comma). I liked the previous version better (always leaving the output neat).

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-    if (affixId < 0) return false;
+    if (affixId < 0 || flag == 0) return false;

Review comment:
       Wouldn't it be cleaner to add a constant alias (static variable) FLAG_UNSET for 0 and replace it throughout the code where it compares to zero? You've changed it from -1 to 0 but it really doesn't make it any clearer that it's a "default" unset state. I think it would benefit from being more verbose here.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       Eh, I was afraid of that. It'd be good to consolidate it at some point.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Oh, so something else (other than this method) appends to that stringbuilder? Maybe those places should be fixed instead? I don't have all of the code in front of me, so the question may be naive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       `ParseException` needs some `errorOffset` obligatorily (which is dubiously filled here with the current line number), and it's not available in this method, and not all callers have anything meaningful to pass there. For consistency, we could replace `ParseException` with something less choosy :)

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Yes, that's `some tests failed after implementing step 1 and were fixed in step 2`. However nice it seemed, it was wrong, because other flags were appended after this one without any comma. Trailing commas are no problem, as empty flags are skipped in the previous method.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-    if (affixId < 0) return false;
+    if (affixId < 0 || flag == 0) return false;

Review comment:
       A good idea, thanks!

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Yes. Currently it's just one place which definitely appends no flags before this one, and may append some flags after this, so the implementation is tied to that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675171



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Yes, that's `some tests failed after implementing step 1 and were fixed in step 2`. However nice it seemed, it was wrong, because other flags were appended after this one without any comma. Trailing commas are no problem, as empty flags are skipped in the previous method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss closed pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss closed pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss merged pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss merged pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563678211



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       Eh, I was afraid of that. It'd be good to consolidate it at some point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563680034



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Oh, so something else (other than this method) appends to that stringbuilder? Maybe those places should be fixed instead? I don't have all of the code in front of me, so the question may be naive.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563675377



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-    if (affixId < 0) return false;
+    if (affixId < 0 || flag == 0) return false;

Review comment:
       A good idea, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563528897



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       It'd be great to be consistent with exceptions when parsing input - sometimes it's ParsingException, here it's IAE. 

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Is this intentional? Because it changes the logic of concatenation (always leaving the trailing comma). I liked the previous version better (always leaving the output neat).

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -588,7 +577,7 @@ private boolean checkCondition(
   }
 
   private boolean isFlagAppendedByAffix(int affixId, char flag) {
-    if (affixId < 0) return false;
+    if (affixId < 0 || flag == 0) return false;

Review comment:
       Wouldn't it be cleaner to add a constant alias (static variable) FLAG_UNSET for 0 and replace it throughout the code where it compares to zero? You've changed it from -1 to 0 but it really doesn't make it any clearer that it's a "default" unset state. I think it would benefit from being more verbose here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563684678



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1310,10 +1330,8 @@ void appendFlag(char flag, StringBuilder to) {
 
     @Override
     void appendFlag(char flag, StringBuilder to) {

Review comment:
       Yes. Currently it's just one place which definitely appends no flags before this one, and may append some flags after this, so the implementation is tied to that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] donnerpeter commented on a change in pull request #2238: LUCENE-9693: Hunspell: check that all flags are > 0 and fit char range

Posted by GitBox <gi...@apache.org>.
donnerpeter commented on a change in pull request #2238:
URL: https://github.com/apache/lucene-solr/pull/2238#discussion_r563673451



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -1299,7 +1314,12 @@ void appendFlag(char flag, StringBuilder to) {
         if (replacement.isEmpty()) {
           continue;
         }
-        flags[upto++] = (char) Integer.parseInt(replacement);
+        int flag = Integer.parseInt(replacement);
+        if (flag == 0 || flag >= Character.MAX_VALUE) { // read default flags as well
+          throw new IllegalArgumentException(

Review comment:
       `ParseException` needs some `errorOffset` obligatorily (which is dubiously filled here with the current line number), and it's not available in this method, and not all callers have anything meaningful to pass there. For consistency, we could replace `ParseException` with something less choosy :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org