You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/11/30 21:15:00 UTC

[jira] [Resolved] (LUCENE-10248) Add SpanishPluralStemFilter

     [ https://issues.apache.org/jira/browse/LUCENE-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-10248.
----------------------------------
    Fix Version/s: 9.1
       Resolution: Fixed

Thank you [~xavier.sanchez] for the great contribution and write-up!

> Add SpanishPluralStemFilter
> ---------------------------
>
>                 Key: LUCENE-10248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10248
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 9.0
>            Reporter: Xavier Sanchez Loro
>            Priority: Major
>             Fix For: 9.1
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> We propose a new Spanish stemmer just for stemming plural to singular whilst maintaining gender: the SpanishPluralStemmer. Our goal is to provide a lightweight algorithmic approach with better precision and recall than current approaches.
> In the following [article|https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373] we made a comparison of different Spanish Stemmers and use cases and which value adds our contribution
> Our Solution is an algorithmic approach Spanish rules for building plural forms
> based on rules defined in [wikilengua| http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n)]
> Some characteristics:
>  * Designed to stem just plural to singular form
>  * Distinguishes between masculine and feminine forms
>  * It will increase recall but precision can be reduced depending on the use case/information need
>  * Stems plural words of foreign origin: i.e. complots, bits, punks, robots
>  * Support for invariant words: same plural and singular form or plural does not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc
>  * Support for special cases: i.e. yoes, clubes, itemes, faralaes
>  * Use it when the distinction between singular and plural is not relevant but gender is relevant
>  * Produces meaningful tokens in form of singular
>  ** Not strange stems like “amig”: it’s true that stemmers must not generate grammatically correct tokens, but if we generate correct stems we decrease the possibility of collisions with other words



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org