You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/04/18 23:29:51 UTC

[jira] Issue Comment Edited: (LUCENE-2400) ShingleFilter: don't output all-filler shingles/unigrams; also, convert from TermAttribute to CharTermAttribute

    [ https://issues.apache.org/jira/browse/LUCENE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858344#action_12858344 ] 

Uwe Schindler edited comment on LUCENE-2400 at 4/18/10 5:28 PM:
----------------------------------------------------------------

bq. I tried adding specialized versions of CharTermAttribute.append(StringBuilder,...): 

Did you also add this to the interface, else your code would not use this method. LUCENE-2401 does not have the start,end methods, as this is not even in StringBuilder.

      was (Author: thetaphi):
    bq. I tried adding specialized versions of CharTermAttribute.append(StringBuilder,...): 

Did you also add this to the interface, else your code would not use this method. LUCENE-1401 does not have the start,end methods, as this is not even in StringBuilder.
  
> ShingleFilter: don't output all-filler shingles/unigrams; also, convert from TermAttribute to CharTermAttribute
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2400
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2400
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 3.0.1
>            Reporter: Steven Rowe
>            Priority: Minor
>         Attachments: LUCENE-2400.patch, LUCENE-2400.patch, LUCENE-2400.patch
>
>
> When the input token stream to ShingleFilter has position increments greater than one, filler tokens are inserted for each position for which there is no token in the input token stream.  As a result, unigrams (if configured) and shingles can be filler-only.  Filler-only output tokens make no sense - these should be removed.
> Also, because TermAttribute has been deprecated in favor of CharTermAttribute, the patch will also convert TermAttribute usages to CharTermAttribute in ShingleFilter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org