You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ben Kazez (Jira)" <ji...@apache.org> on 2020/06/18 14:54:00 UTC

[jira] [Created] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

Ben Kazez created LUCENE-9410:
---------------------------------

             Summary: German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert
                 Key: LUCENE-9410
                 URL: https://issues.apache.org/jira/browse/LUCENE-9410
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 8.5
         Environment: Elasticsearch 7.7.1 running on cloud.elastic.co
            Reporter: Ben Kazez


I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where German and French stemming (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) fails to identify some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged
- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The folks from Elasticsearch said I should file a bug with Lucene: https://discuss.elastic.co/t/better-french-and-german-stemming/236283



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org