You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2015/07/29 16:28:05 UTC
[jira] [Commented] (SOLR-7848) Strictly enforce
charFilter/tokenizer/filter order in fieldType definitions
[ https://issues.apache.org/jira/browse/SOLR-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646116#comment-14646116 ]
Steve Rowe commented on SOLR-7848:
----------------------------------
+1, accepting out of order field type definitions is a misfeature.
> Strictly enforce charFilter/tokenizer/filter order in fieldType definitions
> ---------------------------------------------------------------------------
>
> Key: SOLR-7848
> URL: https://issues.apache.org/jira/browse/SOLR-7848
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Affects Versions: 5.2.1
> Reporter: Shawn Heisey
> Priority: Minor
>
> Currently you can define a fieldType with the components specified backwards:
> {noformat}
> <fieldType name="icu_test" class="solr.TextField">
> <analyzer>
> <filter class="solr.LowercaseFilterFactory"/>
> <tokenizer class="solr.ICUTokenizerFactory"/>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> </analyzer>
> </fieldType>
> {noformat}
> This will work (just tested in 5.2.1), but it will work in exactly the opposite order that it is defined.
> The moinmoin wiki page for Analyzers, Tokenizers, and TokenFilters, in the section for HTMLStripCharFilterFactory, states that charFilter definitions must come before the tokenizer. This bit of documentation is wrong.
> The easiest fix would be to correct the wiki page, but if the order in the config can be detected, we could emit a warning in 5.x when the order is wrong and fail to start the core in 6.0.
> When I was first building my schema, back in the 1.4 days, I was thoroughly confused and caught off guard when I tried to use PatternReplaceCharFilterFactory. I found that it was being executed before tokenization, even though I had defined it AFTER. I did eventually figure out my mistake and switched to PatternReplaceFilterFactory. If the incorrect order had been enforced, or caused a warning in the log, I would have figured it out a lot sooner.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org