You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Sokolov (JIRA)" <ji...@apache.org> on 2018/04/05 14:27:00 UTC
[jira] [Updated] (LUCENE-8240) Make TokenStreamComponents.setReader
public
[ https://issues.apache.org/jira/browse/LUCENE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Sokolov updated LUCENE-8240:
---------------------------------
Summary: Make TokenStreamComponents.setReader public (was: Support different analysis per field instance)
> Make TokenStreamComponents.setReader public
> -------------------------------------------
>
> Key: LUCENE-8240
> URL: https://issues.apache.org/jira/browse/LUCENE-8240
> Project: Lucene - Core
> Issue Type: Wish
> Components: modules/analysis
> Reporter: Mike Sokolov
> Priority: Major
> Attachments: SubFieldAnalyzer.java
>
>
> The simplest change for this would be to make TokenStreamComponents.setReader() public. Another alternative would be to provide a SubFieldAnalyzer along the lines of what is attached, although for reasons given below I think this implementation is a little hacky and would ideally be supported in a different way before making *that* part of a public Lucene API.
> Exposing this method would allow a third-party extension to access it in order to wrap TokenStreamComponents. My use case is a SubFieldAnalyzer (attached, for reference) that applies different analysis to different instances of a field. This supports a big "catch-all" field that has different (index-time) text processing. The way we implement that is by creating a TokenStreamComponents that wraps separate per-subfield components and switches among them when setReader() is called.
> Why setReader()? This is the only part of the API where we can inject this notion of subfields. setReader() is called with a Reader for each field instance, and we supply a special Reader that identifies its subfield.
> This is a bit hacky – ideally subfields would be first-class citizens in the Analyzer API, so eg there would be methods like Analyzer.createComponents(String fieldName, String subFieldName), etc. However this seems like a pretty big change for an experimental feature, so it seems like an OK tradeoff to live with the Reader-per-subfield hack for now.
> Currently SubFieldAnalyzer has to live in org.apache.lucene.analysis package in order to call TokenStreamComponents.setReader (on a separate instance) and propitiate java's code-hiding rules, which is awkward.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org