You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ben Kazez (Jira)" <ji...@apache.org> on 2020/06/18 14:55:00 UTC

[jira] [Updated] (LUCENE-9410) German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

     [ https://issues.apache.org/jira/browse/LUCENE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben Kazez updated LUCENE-9410:
------------------------------
    Description: 
I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged

- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The Elasticsearch folks [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I should file a bug with Lucene.

  was:
I'm using Lucene via Elasticsearch 7.7.1 and have run into an issue where German and French stemming (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) fails to identify some common forms:

- French:
  - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged
- German:
  - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
  - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
  - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"

The folks from Elasticsearch said I should file a bug with Lucene: https://discuss.elastic.co/t/better-french-and-german-stemming/236283


> German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-9410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9410
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 8.5
>         Environment: Elasticsearch 7.7.1 running on cloud.elastic.co
>            Reporter: Ben Kazez
>            Priority: Major
>              Labels: french, german, stemmer, stemming
>
> I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms:
> - French:
>   - "maux" should match "mal" ("maux" is plural of "mal") but instead "maux" is unchanged
> - German:
>   - "schlummert" should match "schlummern" (infinitive) but instead is unchanged
>   - "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
>   - "gegrüßt"  should match "grüßen" (infinitive) but instead yields "gegrusst"
> The Elasticsearch folks [said|https://discuss.elastic.co/t/better-french-and-german-stemming/236283] I should file a bug with Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org