You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/12/19 15:29:00 UTC

[jira] [Commented] (LUCENE-5803) Add another AnalyzerWrapper class that does not have its own cache, so delegate-only wrappers don't create thread local resources several times

    [ https://issues.apache.org/jira/browse/LUCENE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296959#comment-16296959 ] 

ASF subversion and git services commented on LUCENE-5803:
---------------------------------------------------------

Commit 9f7f76f267bd46b0069731ba1ae4990d31c33df8 in lucene-solr's branch refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9f7f76f ]

LUCENE-5803: Add a Solr test that we reuse analysis components across fields for the same field type


> Add another AnalyzerWrapper class that does not have its own cache, so delegate-only wrappers don't create thread local resources several times
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5803
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5803
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.10, 6.0
>
>         Attachments: LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch
>
>
> This is a followup issue for the following Elasticsearch issue: https://github.com/elasticsearch/elasticsearch/pull/6714
> Basically the problem is the following:
> - Elasticsearch has a pool of Analyzers that are used for analysis in several indexes
> - Each index uses a different PerFieldAnalyzerWrapper
> PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it caches the tokenstreams for every field. If there are many fields, this are a lot. In addition, the underlying analyzers may also cache tokenstreams and other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer can always return the same components.
> We should add similar code to Elasticsearch's directly to Lucene: If the delegating Analyzer just delegates per Field or just wraps CharFilters around the Reader, there is no need to cache the TokenStreamComponents a second time in the delegating Analyzers. This is only needed, if the delegating Analyzers adds additional TokenFilters (like ShingleAnalyzerWrapper).
> We should name this new class DelegatingAnalyzerWrapper extends AnalyzerWrapper. The wrapComponents method must be final, because we are not allowed to add additional TokenFilters, but unlike ES, we don't need to disallow wrapping with CharFilters.
> Internally this class uses a private ReuseStrategy that just delegates to the underlying analyzer. It does not matter here if the strategy of the delegate is global or per field, this is private to the delegate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org