You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/29 07:36:16 UTC

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2260: LUCENE-9704: Hunspell: support capitalization for German ß

dweiss commented on a change in pull request #2260:
URL: https://github.com/apache/lucene-solr/pull/2260#discussion_r566626601



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -158,14 +173,52 @@ WordCase caseOf(char[] word, int length) {
     return null;
   }
 
+  List<char[]> sharpSVariations(char[] word, int length) {
+    if (!dictionary.checkSharpS) return Collections.emptyList();
+
+    Stream<String> result =

Review comment:
       Do you think it makes sense to use language trickery (anonymous subclass of Object) just to hide those two methods? I see it's a recursive call to replaceSS but I'd just move it to static methods at the top class level and let them be used directly from there?

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -158,14 +173,52 @@ WordCase caseOf(char[] word, int length) {
     return null;
   }
 
+  List<char[]> sharpSVariations(char[] word, int length) {
+    if (!dictionary.checkSharpS) return Collections.emptyList();
+
+    Stream<String> result =
+        new Object() {
+          int findSS(int start) {
+            for (int i = start; i < length - 1; i++) {
+              if (word[i] == 's' && word[i + 1] == 's') {

Review comment:
       I don't think there is. There is an ancient discussion somewhere on jdk's mailing lists about making String.indexOf(String) more algorithmically efficient but it never made it through - it's hard to beat the naive algorithm on average, common case and if you have to deal with degenerates, it's advised to just roll out your own version.

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/WordCase.java
##########
@@ -58,4 +58,20 @@ private static WordCase get(boolean startsWithLower, boolean seenUpper, boolean
     }
     return seenUpper ? MIXED : LOWER;
   }
+
+  private static CharCase charCase(char c) {
+    if (Character.isUpperCase(c)) {
+      return CharCase.UPPER;
+    }
+    if (Character.isLowerCase(c) && Character.toUpperCase(c) != c) {
+      return CharCase.LOWER;
+    }
+    return CharCase.NEUTRAL;

Review comment:
       Ah, you learn things every day...

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/Stemmer.java
##########
@@ -158,14 +173,52 @@ WordCase caseOf(char[] word, int length) {
     return null;
   }
 
+  List<char[]> sharpSVariations(char[] word, int length) {
+    if (!dictionary.checkSharpS) return Collections.emptyList();
+
+    Stream<String> result =

Review comment:
       I also wonder if all that stream trickery is worth it in favor of a simple collector list pushed down recursively... Streams are nice, but a list requires less thinking (to me).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org