You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2009/08/22 10:11:14 UTC

[jira] Commented: (LUCENE-1842) Add reset(AttributeSource) method to AttributeSource

    [ https://issues.apache.org/jira/browse/LUCENE-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746428#action_12746428 ] 

Uwe Schindler commented on LUCENE-1842:
---------------------------------------

I still do not understand your proposal. You can always create all tokenizer chains at the beginning with exactly one tokenizer (after LUCENE-1826). You are then free to call incrementToken() on all sub-tokenstreams and all these calls will put the tokenized values in the same attributes.

Adding a reset(AttributeSource) method would not help really, as you would have to do this for the whole Tokenizer chain. If you do it in the wrong way, there may be some tokenfilters in the chain that use a different attributesource and so on. Because of all these problem and the complexity, we do not want to have setters for AttributeSources or changes of AttributeFactory and so on. During the lifetime of one TokenStream, there is in my opinion no real use-case for changing its attribute maps that rectify the added complexity and risk for errors. 

The cost of adding Attributes is very low if you reuse TokenStreams, what you could even do with your concenatting TokenStream.

> Add reset(AttributeSource) method to AttributeSource
> ----------------------------------------------------
>
>                 Key: LUCENE-1842
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1842
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Analysis
>            Reporter: Tim Smith
>            Priority: Minor
>             Fix For: 2.9
>
>
> Originally proposed in LUCENE-1826
> Proposing the addition of the following method to AttributeSource
> {code}
> public void reset(AttributeSource input) {
>     if (input == null) {
>       throw new IllegalArgumentException("input AttributeSource must not be null");
>     }
>     this.attributes = input.attributes;
>     this.attributeImpls = input.attributeImpls;
>     this.factory = input.factory;
> }
> {code}
> Impacts:
> * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their reset() method, not in their constructor
> * requires making AttributeSource.attributes and AttributeSource.attributesImpl non-final
> Advantages:
> Allows creating only a single actual AttributeSource per thread that can then be used for indexing with a multitude of TokenStream/Tokenizer combinations (allowing utmost reuse of TokenStream/Tokenizer instances)
> this results in only a single "attributes"/"attributesImpl" map being required per thread
> addAttribute() calls will almost always return right away (will only be "initialized" once per thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org