You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Robert Muir (Jira)" <ji...@apache.org> on 2021/11/30 21:15:00 UTC
[jira] [Resolved] (LUCENE-10248) Add SpanishPluralStemFilter
[ https://issues.apache.org/jira/browse/LUCENE-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-10248.
----------------------------------
Fix Version/s: 9.1
Resolution: Fixed
Thank you [~xavier.sanchez] for the great contribution and write-up!
> Add SpanishPluralStemFilter
> ---------------------------
>
> Key: LUCENE-10248
> URL: https://issues.apache.org/jira/browse/LUCENE-10248
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Affects Versions: 9.0
> Reporter: Xavier Sanchez Loro
> Priority: Major
> Fix For: 9.1
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> We propose a new Spanish stemmer just for stemming plural to singular whilst maintaining gender: the SpanishPluralStemmer. Our goal is to provide a lightweight algorithmic approach with better precision and recall than current approaches.
> In the following [article|https://medium.com/inside-wallapop/spanish-plural-stemmer-matching-plural-and-singular-forms-in-spanish-using-lucene-93e005e38373] we made a comparison of different Spanish Stemmers and use cases and which value adds our contribution
> Our Solution is an algorithmic approach Spanish rules for building plural forms
> based on rules defined in [wikilengua| http://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n)]
> Some characteristics:
> * Designed to stem just plural to singular form
> * Distinguishes between masculine and feminine forms
> * It will increase recall but precision can be reduced depending on the use case/information need
> * Stems plural words of foreign origin: i.e. complots, bits, punks, robots
> * Support for invariant words: same plural and singular form or plural does not make sense: i.e. crisis, jueves, lapsus, abrebotellas, etc
> * Support for special cases: i.e. yoes, clubes, itemes, faralaes
> * Use it when the distinction between singular and plural is not relevant but gender is relevant
> * Produces meaningful tokens in form of singular
> ** Not strange stems like “amig”: it’s true that stemmers must not generate grammatically correct tokens, but if we generate correct stems we decrease the possibility of collisions with other words
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org