You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/01 18:02:27 UTC

[GitHub] [lucene] thomasschuerger opened a new issue, #11733: Provide a version of GermanNormalizationFilter that uses a modified Umlaut mapping

thomasschuerger opened a new issue, #11733:
URL: https://github.com/apache/lucene/issues/11733

   ### Description
   
   The GermanNormalizationFilter includes the following mappings: ä/ae -> a, ö/oe -> o, ü/ue -> u and ß -> ss (plus some simple rules when "ue" should not be converted to "u"). This mapping is very uncommon in German. In German, it is common to treat ä and ae, ö and oe, ü and ue, as well as ß and ss as equivalent (the ASCII versions are used in cases where you cannot use the non-ASCII characters, e.g. when using an English keyboard or when the system doesn't allow these characters). With this mapping, searching for "Uber" (the company) finds the frequent word "über", which is unexpected, because "u" and "ü" are (normally) not treated as equivalent.
   
   Therefore I would like to see a filter that normalizes German by mapping ä->ae, ö->oe, ü->ue and ß->ss, either by an additional parameter for GermanNormalizationFilter which switches to that mapping (the previous mapping should of course be the default), or by having a separate filter (GermanNormalizationFilter2?) with that mapping.
   
   Using a charfilter is not the same, as this is done before the whole filter chain. The new filter should be a drop-in replacement for GermanNormalizationFilter in any position in the filter chain.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org