You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2013/08/13 14:00:51 UTC

[jira] [Comment Edited] (LUCENE-5170) Add getter for reuse strategy to Analyzer

    [ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738113#comment-13738113 ] 

Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 11:58 AM:
-----------------------------------------------------------------

Robert: After reviewing the code:
The fixed-nonchangeable "default" in AnalyzerWrapper is PerField, which is a large overhead and should only be used in stuff like PerFieldAnalyzerWrapper (this class should call super(PerField) in its own ctor). But for other use cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped analyzer). It looks like the current impl in AnalyzerWrapper is somehow assuming you want to wrap per field.

I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default.

My use case is:
I have lots of predefined Analyzers for several languages or functionality in my search application. I have some additional AnalyzerWrappers around that simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I can use that with another field). So, my wrapper just takes one of these per-language Analyzers and wraps with another additional TokenFilter. As the underlying Analyzer is global reuse, I need to make the wrapper global, too - currently impossible. Per field is a waste of resources in this case.

Only PerFieldAnalyzerWrapper should use PerField strategy hardcoded (as it is per field), the base class not!

So I would suggest to make the base class AnalyzerWrapper copy the ctor of the superclass Analyzer and deprecate the default ctor in 4.x. For my above example (to wrap another analyzer), I still need the resuse strategy of the inner analyzer, so I need set getter on Analyzer.java, too (see current patch).
                
      was (Author: thetaphi):
    Robert: After reviewing the code:
The fixed-nonchangeable "default" in AnalyzerWrapper is PerField, which is a large overhead and should only be used in stuff like PerFieldAnalyzerWrapper (this class should call super(PerField) in its own ctor). But for other use cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped analyzer). It looks like the current impl in AnalyzerWrapper is somehow assuming you want to wrap per field.

I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default.

My use case is:
I have lots of predefined Analyzers for several languages or functionality in my search application. I have some additional AnalyzerWrappers around that simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I can use that with another field). So, my wrapper just takes one of these per-language Analyzers and wraps with another additional TokenFilter. As the underlying Analyzer is global reuse, I need to make the wrapper global, too - currently impossible. Per field is a waste of resources in this case.

So I would suggest to make the base class AnalyzerWrapper copy the ctor of the superclass Analyzer and deprecate the default ctor in 4.x. For my above example (to wrap another analyzer), I still need the resuse strategy of the inner analyzer, so I need set getter on Analyzer.java, too (see current patch).
                  
> Add getter for reuse strategy to Analyzer
> -----------------------------------------
>
>                 Key: LUCENE-5170
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5170
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 5.0, 4.5
>
>         Attachments: LUCENE-5170.patch
>
>
> If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter).
> An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java]
> This would add a getter, just a 3-liner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org