You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Itamar Syn-Hershko (JIRA)" <ji...@apache.org> on 2017/05/05 10:10:04 UTC

[jira] [Commented] (LUCENENET-547) Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer

    [ https://issues.apache.org/jira/browse/LUCENENET-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998030#comment-15998030 ] 

Itamar Syn-Hershko commented on LUCENENET-547:
----------------------------------------------

This is also the case with Apache Lucene (Java):

https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/analysis/common/src/java/org/tartarus/snowball/ext/PortugueseStemmer.java#L84

I believe the right thing to do for Lucene.NET is leave it as-is, analyzers are expected to behave the same in .NET and Java - and as a by-product that will make indexes readable by both. It is easy enough to create your own analyzer by copying the code and fixing what needs to be fixed. It might make sense to also notify the Apache Lucene project so they can fix it in future releases.

> Replace Spanish suffixes by Portuguese suffixes in the Portuguese snowball stemmer
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENENET-547
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-547
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Contrib
>            Reporter: Helder
>              Labels: stemmer
>
> On PortugueseStemmer.cs[1], there are a few suffixes in the PortugueseStemmer which I believe were copied by mistake from SpanishStemmer[2]:
> * "log\u00EDas" should be "logias" (line 137)
> * "log\u00EDa" should be "logia" (line 113)
> * "uciones" should be "uções" (line 139)
> * "uci\u00F3n" should be "ução" (line 120)
> For more details, see the original report on nltk project:
> https://github.com/nltk/nltk/issues/754
> [1] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/PortugueseStemmer.cs
> [2] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/SpanishStemmer.cs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)