You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2012/05/17 15:41:07 UTC
[jira] [Updated] (LUCENE-4065) FilteringTokenFilter should never
corrupt the tokenstream graph
[ https://issues.apache.org/jira/browse/LUCENE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-4065:
--------------------------------
Attachment: LUCENE-4065_test.patch
test case (boiled down from testrandomchains)
A much simpler one could be made.
> FilteringTokenFilter should never corrupt the tokenstream graph
> ---------------------------------------------------------------
>
> Key: LUCENE-4065
> URL: https://issues.apache.org/jira/browse/LUCENE-4065
> Project: Lucene - Java
> Issue Type: Bug
> Components: modules/analysis
> Reporter: Robert Muir
> Attachments: LUCENE-4065_test.patch
>
>
> Currently removers like stopfilter have an option (true/false) to enable position increments.
> If its true: it both inserts gaps where necessary AND propagates gaps down the stream.
> If its false: it does neither, which can totally mess up the tokenstream graph (e.g. move synonyms to another word).
> There are totally valid natural usecases for false, where you don't want gaps because you want phrasequeries to act as if the word was never actually there.
> But 'not inserting gaps' is separate from proper propagation of existing gaps.
> So I think we should provide an option (either fix 'false' or make it an enum), where you still get a legit tokenstream and dont totally screw it up, but you simply omit gaps.
> See LUCENE-3848 for more information (Where we at least fixed this case to not begin the tokenstream with posinc=0)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org