You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/07/08 23:28:51 UTC

[jira] Updated: (LUCENE-2503) light/minimal stemming for euro languages

     [ https://issues.apache.org/jira/browse/LUCENE-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2503:
--------------------------------

    Attachment: LUCENE-2503.patch

I updated the patch, I think this is ready to go:

* added finnish
* created vocabulary tests from reference C,perl,whatever impls, and found/fixed bugs in every language but en,pt,fr (as promised in my last comment)
* created a VocabularyAssert junit util class, and refactored the existing snowball,porter,german,and russian tests to use it, too.
* refactored a bunch of utility stuff that was duplicated everywhere such as endsWith()/delete() and put it in StemmerUtil.

to apply the patch, first apply the patch itself, then please unzip the zip file containing vocabulary tests (LUCENE-2503_modules_analysis_testdata.zip) from the modules/analysis/common dir.

if no one objects, i'll commit in a few days.


> light/minimal stemming for euro languages
> -----------------------------------------
>
>                 Key: LUCENE-2503
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2503
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2503.patch, LUCENE-2503.patch
>
>
> The snowball stemmers are very aggressive and it would be nice if there were lighter alternatives.
> Some applications may want to perform less aggressive stemming, for example:
> http://www.lucidimagination.com/search/document/5d16391e21ca6faf/plural_only_stemmer
> Good, relevance tested algorithms exist and I think we should provide these alternatives.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org