You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2015/07/29 16:28:05 UTC

[jira] [Commented] (SOLR-7848) Strictly enforce charFilter/tokenizer/filter order in fieldType definitions

    [ https://issues.apache.org/jira/browse/SOLR-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646116#comment-14646116 ] 

Steve Rowe commented on SOLR-7848:
----------------------------------

+1, accepting out of order field type definitions is a misfeature. 

> Strictly enforce charFilter/tokenizer/filter order in fieldType definitions
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-7848
>                 URL: https://issues.apache.org/jira/browse/SOLR-7848
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 5.2.1
>            Reporter: Shawn Heisey
>            Priority: Minor
>
> Currently you can define a fieldType with the components specified backwards:
> {noformat}
>     <fieldType name="icu_test" class="solr.TextField">
>       <analyzer> 
>         <filter class="solr.LowercaseFilterFactory"/>
>         <tokenizer class="solr.ICUTokenizerFactory"/>
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>       </analyzer>
>     </fieldType>
> {noformat}
> This will work (just tested in 5.2.1), but it will work in exactly the opposite order that it is defined.
> The moinmoin wiki page for Analyzers, Tokenizers, and TokenFilters, in the section for HTMLStripCharFilterFactory, states that charFilter definitions must come before the tokenizer.  This bit of documentation is wrong.
> The easiest fix would be to correct the wiki page, but if the order in the config can be detected, we could emit a warning in 5.x when the order is wrong and fail to start the core in 6.0.
> When I was first building my schema, back in the 1.4 days, I was thoroughly confused and caught off guard when I tried to use PatternReplaceCharFilterFactory.  I found that it was being executed before tokenization, even though I had defined it AFTER.  I did eventually figure out my mistake and switched to PatternReplaceFilterFactory.  If the incorrect order had been enforced, or caused a warning in the log, I would have figured it out a lot sooner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org