You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Sokolov (JIRA)" <ji...@apache.org> on 2018/06/12 15:06:00 UTC

[jira] [Comment Edited] (LUCENE-8352) Make TokenStreamComponents final

    [ https://issues.apache.org/jira/browse/LUCENE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509726#comment-16509726 ] 

Mike Sokolov edited comment on LUCENE-8352 at 6/12/18 3:05 PM:
---------------------------------------------------------------

{quote}So maybe we could remove this setReader method, make TokenStreamComponents final, and replace the tokenizer field with a Consumer<Reader> that would be tokenizer::setReader by default?{quote}

I think that would work for me, yes, and not too difficult either :) 


was (Author: sokolov):
bq So maybe we could remove this setReader method, make TokenStreamComponents final, and replace the tokenizer field with a Consumer<Reader> that would be tokenizer::setReader by default?

I think that would work for me, yes, and not too difficult either :) 

> Make TokenStreamComponents final
> --------------------------------
>
>                 Key: LUCENE-8352
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8352
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mark Harwood
>            Priority: Minor
>
> The current design is a little trappy. Any specialised subclasses of TokenStreamComponents _(see_ _StandardAnalyzer, ClassicAnalyzer, UAX29URLEmailAnalyzer)_ are discarded by any subsequent Analyzers that wrap them _(see LimitTokenCountAnalyzer, QueryAutoStopWordAnalyzer, ShingleAnalyzerWrapper and other examples in elasticsearch)_. 
> The current design means each AnalyzerWrapper.wrapComponents() implementation discards any custom TokenStreamComponents and replaces it with one of its own choosing (a vanilla TokenStreamComponents class from examples I've seen).
> This is a trap I fell into when writing a custom TokenStreamComponents with a custom setReader() and I wondered why it was not being triggered when wrapped by other analyzers.
> If AnalyzerWrapper is designed to encourage composition it's arguably a mistake to also permit custom TokenStreamComponent subclasses  - the composition process does not preserve the choice of custom classes and any behaviours they might add. For this reason we should not encourage extensions to TokenStreamComponents (or if TSC extensions are required we should somehow mark an Analyzer as "unwrappable" to prevent lossy compositions).
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org