You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/10 08:20:57 UTC

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2332: LUCENE-9750: Hunspell: improve suggestions for mixed-case misspelled words

dweiss commented on a change in pull request #2332:
URL: https://github.com/apache/lucene-solr/pull/2332#discussion_r573526495



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Dictionary.java
##########
@@ -70,7 +70,7 @@
 
 /** In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary. */
 public class Dictionary {
-  // Derived from woorm/ openoffice dictionaries.
+  // Derived from woorm/LibreOffice dictionaries.

Review comment:
       Thanks!

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/SpellChecker.java
##########
@@ -433,24 +433,26 @@ private boolean canBeBrokenAt(String word, String breakStr, int breakPos) {
 
     Set<String> result = new LinkedHashSet<>();
     for (String candidate : suggestions) {
-      result.add(adjustSuggestionCase(candidate, wordCase));
+      result.add(adjustSuggestionCase(candidate, wordCase, word));
       if (wordCase == WordCase.UPPER && dictionary.checkSharpS && candidate.contains("ß")) {
         result.add(candidate);
       }
     }
     return new ArrayList<>(result);
   }
 
-  private String adjustSuggestionCase(String candidate, WordCase original) {
-    if (original == WordCase.UPPER) {
+  private String adjustSuggestionCase(String candidate, WordCase originalCase, String original) {
+    if (originalCase == WordCase.UPPER) {
       String upper = candidate.toUpperCase(Locale.ROOT);
       if (upper.contains(" ") || spell(upper)) {
         return upper;
       }
     }
-    if (original == WordCase.UPPER || original == WordCase.TITLE) {
-      String title = dictionary.toTitleCase(candidate);
-      return spell(title) ? title : candidate;
+    if (Character.isUpperCase(original.charAt(0))) {
+      String title = Character.toUpperCase(candidate.charAt(0)) + candidate.substring(1);
+      if (title.contains(" ") || spell(title)) {

Review comment:
       Just out of curiosity - Hunspell doesn't take into account odd whitespace symbols (like non-breakable space), does it? I've spent a number of hours of my life debugging things that *should* work looking at the input only to turn out white spaces were not actually " "... 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org