You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2011/06/21 00:16:47 UTC

[jira] [Commented] (LUCENE-2341) explore morfologik integration

    [ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052246#comment-13052246 ] 

Robert Muir commented on LUCENE-2341:
-------------------------------------

Hi MichaƂ,

This patch looks great!

I took a quick glance, here are a couple suggestions:
* In the MorfologikFilter, I think we should implement reset(), first calling the superclass reset(), then clearing the stemsAcc list. This ensures that all of the filter's state is cleared before it is reused. Under normal operations, this should not be necessary, but some consumers in Lucene (e.g. LimitTokenCountFilter, and some similar code in the Highlighter), will only partially consume up to some point, then suddenly stop. By clearing this list in reset() we ensure that there is no chance any leftover stems will appear in the next stream.
* because the data is licensed under MPL, I think we should explicitly list a hyperlink if possible to the source code used in the NOTICE.txt. I saw you included some wordage in LICENSE.txt but I think this should only say 'XYZ data is under this license, with the actual MPL license text. In the NOTICE.txt we should link to the source code I think... there is some more information on this under the section Category B: Reciprocal Licenses at http://www.apache.org/legal/3party.html


> explore morfologik integration
> ------------------------------
>
>                 Key: LUCENE-2341
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2341
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Dawid Weiss
>         Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org