You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/09/09 01:34:57 UTC

[jira] Issue Comment Edited: (LUCENE-1903) Incorrect ShingleFilter behavior when outputUnigrams == false

    [ https://issues.apache.org/jira/browse/LUCENE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752821#action_12752821 ] 

Robert Muir edited comment on LUCENE-1903 at 9/8/09 4:34 PM:
-------------------------------------------------------------

here is what i think:
this is what Michael Busch said in LUCENE-1775

{quote}
ShingleFilter and ShingleFilterTest are converted to the new API.

ShingleFilter is much more efficient now, it clones much less often and computes the tokens mostly on the fly now. 
{quote}

the fact it went to the new API appears to have made it to CHANGES, but not the fact it is more efficient.
so maybe it could be mentioned in CHANGES not only that it went to the new API,
but that it is more efficient and that Chris & Uwe added additional tests and fixed bugs/ensured correctness?

by the way, you can take my name off existing CHANGE if you want, I did nothing :)


      was (Author: rcmuir):
    here is what i think:
this is what Michael Busch said in LUCENE-1775

{quote}
ShingleFilter and ShingleFilterTest are converted to the new API.

ShingleFilter is much more efficient now, it clones much less often and computes the tokens mostly on the fly now. 
{quote}

the fact it went to the new API appears to have made it to CHANGES, but not the fact it is more efficient.
so maybe it could be mentioned in CHANGES not only that it went to the new API,
but that it is more efficient and that Chris & Uwe added additional tests and fixed bugs/ensured correctness?
  
> Incorrect ShingleFilter behavior when outputUnigrams == false
> -------------------------------------------------------------
>
>                 Key: LUCENE-1903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1903
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Chris Harris
>             Fix For: 2.9
>
>         Attachments: LUCENE-1903.patch, LUCENE-1903_testcases.patch, LUCENE-1903_testcases_lucene2_4_1_version.patch, TEST-org.apache.lucene.analysis.shingle.ShingleFilterTest.xml
>
>
> ShingleFilter isn't working as expected when outputUnigrams == false. In particular, it is outputting unigrams at least some of the time when outputUnigrams==false.
> I'll attach a patch to ShingleFilterTest.java that adds some test cases that demonstrate the problem.
> I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false got changed when the class was upgraded to the new TokenStream API?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org