You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2014/07/11 18:38:04 UTC

[jira] [Resolved] (LUCENE-5818) Fix hunspell zero-string overgeneration

     [ https://issues.apache.org/jira/browse/LUCENE-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-5818.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 4.10
                   5.0

> Fix hunspell zero-string overgeneration
> ---------------------------------------
>
>                 Key: LUCENE-5818
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5818
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5818.patch
>
>
> Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage).
> Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org