You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/14 21:22:17 UTC

[GitHub] [lucene] janhoy commented on a change in pull request #84: LUCENE-9929 Make ScandinavianNormalizationFilter configurable wrt fol…

janhoy commented on a change in pull request #84:
URL: https://github.com/apache/lucene/pull/84#discussion_r613595880



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.java
##########
@@ -33,14 +34,45 @@
  * <p>blåbærsyltetøj == blåbärsyltetöj == blaabaarsyltetoej but not blabarsyltetoj räksmörgås ==
  * ræksmørgås == ræksmörgaos == raeksmoergaas but not raksmorgas
  *
+ * <p>You can choose which of the foldings to apply (aa, ao, ae, oe, oo) through a parameter.
+ *
  * @see ScandinavianFoldingFilter
  */
 public final class ScandinavianNormalizationFilter extends TokenFilter {
 
+  /**
+   * Create the filter with default folding rules, backward compatible with all earlier versions
+   *
+   * @param input the TokenStream
+   */
   public ScandinavianNormalizationFilter(TokenStream input) {
     super(input);
+    this.foldings = ALL_FOLDINGS;
   }
 
+  /**
+   * Create the filter using custom folding rules.
+   *
+   * @param input the TokenStream
+   * @param foldings a Set of Foldings to apply (i.e. AE, OE, AA, AO, OO)
+   */
+  public ScandinavianNormalizationFilter(TokenStream input, Set<Foldings> foldings) {

Review comment:
       We can obtain a similar Lucene API usability by adding helper vars:
   ```java
   public static final Set<Foldings> ALL_FOLDINGS = Set.of(AA, AO, OO, AE, OE);
   public static final Set<Foldings> NORWEGIAN_FOLDINGS = Set.of(AE, OE, AA);
   public static final Set<Foldings> DANISH_FOLDINGS = NORWEGIAN_FOLDINGS;
   public static final Set<Foldings> SWEDISH_FOLDINGS = ALL_FOLDINGS;
   ```
   In the factory that would translate to perhaps a "language" parameter with predefined settings.
   
   I'm not opposed to thin wrapper filters for each language, but I'd like some feedback from other Scandinavian users on what those should default to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org