You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Shad Storhaug (JIRA)" <ji...@apache.org> on 2017/06/28 18:43:00 UTC

[jira] [Commented] (LUCENENET-551) Latin language Stemmer (feature request)

    [ https://issues.apache.org/jira/browse/LUCENENET-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067035#comment-16067035 ] 

Shad Storhaug commented on LUCENENET-551:
-----------------------------------------

I am curious, do you still think this would be useful? If so, would you be interested in taking on this project? It doesn't look like it would be too difficult. Or, have you already done it?

Now that we are on Lucene 4.8.0, the contrib project is gone and the snowball analyzers are part of the Lucene.Net.Analysis.Common project. They originated from here: http://snowball.tartarus.org/ which has moved to: https://github.com/snowballstem/snowball. There was no Latin in the original, but I don't think it would be very difficult to port from Ruby.

That said, this is something that would put us out of sync with Lucene, since they don't have a Latin Snowball analyzer. So it feels like it doesn't belong here (instead it should be in its own repo). On the other side of that argument, it would be a lot easier to keep in version sync with Lucene.Net if it were in our repo. And if it were contributed directly to Lucene, it would take many months/years to trickle down to Lucene.Net. Itamar, what are your thoughts on this?

> Latin language Stemmer (feature request)
> ----------------------------------------
>
>                 Key: LUCENENET-551
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-551
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Contrib, Lucene.Net.Analysis.Common
>    Affects Versions: Lucene.Net 3.0.3, Lucene.Net 4.8.0
>            Reporter: Peter Halasz
>
> I would find a Latin language stemmer very helpful. The Schinke Latin stemming algorithm has been converted to Snowball here: http://snowball.tartarus.org/otherapps/schinke/intro.html . I have not worked out how to compile Snowball into .cs to try it.
> There are currently 5 romance-languages supported (French, Spanish, Portuguese, Italian, Romanian). so if the above doesn't work, I imagine one of these could be modified to support Latin.
> I realise SF.Snowball is considered a contrib package rather than core, but Lucene.Net seems to be the main place where Snowball stemmers are provided and maintained for C# / .Net.
> Note, other language ports of Snowball support Latin (using the Schinke contribution), such as Ruby: https://github.com/aurelian/ruby-stemmer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)