You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2007/06/07 07:40:26 UTC

[jira] Resolved: (LUCENE-915) PorterStemmer is incorrectly truncating words ending in e

     [ https://issues.apache.org/jira/browse/LUCENE-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved LUCENE-915.
-----------------------------

    Resolution: Invalid

> I'd imagine you aren't going to fix it since it would require explicit 'exception word' 
> checking being added to the algorithm. 

...well, my point actually is that there is no bug to fix -- the algorithm is what it is, and the code implements the algorithm.

changing the code wouldn't be fixing a bug, it would be breaking the PorterStemmer class so that it no longer does what it says "implementing the Porter Stemming Algorithm"

i'm sure there are *lots* of other use cases unrelated the the ones you outlined where people could argue that the Porter algorithm does something they don't want -- but that's just the nature of algorithm stemmers.  as outlined onthe Porter Stemmer homepage...

"The most frequently asked question is why word X should be stemmed to x1, when one would have expected it to be stemmed to x2. It is important to remember that the stemming algorithm cannot achieve perfection. On balance it will (or may) improve IR performance, but in individual cases it may sometimes make what are, or what seem to be, errors."

> PorterStemmer is incorrectly truncating words ending in e
> ---------------------------------------------------------
>
>                 Key: LUCENE-915
>                 URL: https://issues.apache.org/jira/browse/LUCENE-915
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index, QueryParser, Search
>    Affects Versions: 1.9
>         Environment: Java 1.5 on Mac OS X 10.4.
>            Reporter: Paul Curren
>
> Searching for the word 'orange' will result incorrectly in matches for 'orang'.
> Likewise, searching for 'apple' will incorrectly match 'appl'
> The problem is in step6() of the PorterStemmer class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org