You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/08/08 18:51:44 UTC
[jira] Commented: (LUCENE-1350) Filters which are "consumers"
should not reset the payload or flags and should better reuse the token
[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620970#action_12620970 ]
Michael McCandless commented on LUCENE-1350:
--------------------------------------------
It seems like there are three different things, here:
# Many filters (eg SnowballFilter) incorrectly erase the Payload,
token Type and token flags, because they are basically doing
their own Token cloning. This is pre-existing (before re-use API
was created).
# Separately, these filters do not use the re-use API, which we are
wanting to migrate to anyway.
# Adding new "reuse" methods on Token which are like clear() except
they also take args to replace the termBuffer, start/end offset,
etc, and they do not clear the payload/flags to their defaults.
Since in LUCENE-1333 we are aggressively moving all Lucene core &
contrib TokenStream & TokenFilters to use the re-use API (formally
deprecating the original non-reuse API), we may as well fix 1 & 2 at
once.
I think the reuse API proposal is reasonable: it mirrors the current
constructors on Token. But, since we are migrating to reuse api, you
need the analog (of all these constructors) without making a new
Token.
But maybe change the name from "reuse" to maybe "update", "set",
"reset", "reinit", or "change"? But: I think this method should still
reset payload, position incr, etc, to defaults? Ie calling this
method should get you the same result as creating a new Token(...)
passing in the termBuffer, start/end offset, etc, I think?
Should we just absorb this issue into LUCENE-1333? DM, of your list
above (of filters that lose payload), are there any that are not fixed
in LUCENE-1333? I'm confused on the overlap and it's hard to work
with all the patches. Actually if in LUCENE-1333 you could
consolidate down to a single patch (big toplevel "svn diff"), that'd
be great :)
> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1350
> URL: https://issues.apache.org/jira/browse/LUCENE-1350
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis, contrib/*
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Fix For: 2.3.3
>
> Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org