You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2010/09/14 03:10:33 UTC

[jira] Created: (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order

IndexSchema should log warning if <analyzer> is declared with charfilter/tokenizer/tokenfiler out of order
----------------------------------------------------------------------------------------------------------

                 Key: SOLR-2119
                 URL: https://issues.apache.org/jira/browse/SOLR-2119
             Project: Solr
          Issue Type: Improvement
          Components: Schema and Analysis
            Reporter: Hoss Man


There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter -- while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense.

at the moment, some people are attempting to do things like "move the Foo <tokenFilter/> before the <tokenizer/>" to try and get certain behavior ... at a minimum we should log a warning in this case that doing that doesn't have the desired effect

(we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909511#action_12909511 ] 

Robert Muir commented on SOLR-2119:
-----------------------------------

{quote}
There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter - while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense.
{quote}

I think we should do both, this is a great idea.

{quote}
(we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive)
{quote}

I would prefer a hard error. I think someone who doesnt understand what tokenizers and filters do, likely isnt looking at their log files either.

In my opinion, Solr should be more picky about its configuration. Often times if i havent had enough sleep i will type tokenFilter instead of filter, and it simply ignores it completely, instead of an error.

and i can't be the only one that does this, its not obvious that tokenizer = Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = TokenFilter.


> IndexSchema should log warning if <analyzer> is declared with charfilter/tokenizer/tokenfiler out of order
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2119
>                 URL: https://issues.apache.org/jira/browse/SOLR-2119
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Hoss Man
>
> There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter -- while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo <tokenFilter/> before the <tokenizer/>" to try and get certain behavior ... at a minimum we should log a warning in this case that doing that doesn't have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org