You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rajani Maski <ra...@gmail.com> on 2012/07/25 10:15:59 UTC
Significance of Analyzer Class attribute
Hi, What is the significance of Analyzer class attribute?
When I specify analyzer class in schema, something like below and do
analysis on this field in analysis page : I cant see verbose output on
tokenizer and filters
<fieldType name="text_chinese" class="solr.TextField">
<analyzer
class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
</analyzer>
</fieldType>
*But if i don't add analyzer class, I can see the verbose output based on
token and filters applied.*
<fieldType name="text_chinese" class="solr.TextField">
<analyzer>
<tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
<filter class="solr.SmartChineseWordTokenFilterFactory"/>
</analyzer>
</fieldType>
Why is it that I cant see for above case? What happens when I specify
Analyzer class? Does it take any default if i do not mention class
attribute in analyzer tag?
Thanks & Regards
Rajani
Re: Significance of Analyzer Class attribute
Posted by Lance Norskog <go...@gmail.com>.
An Analyzer object is a chain of Tokenizer and TokenFilters. These
text type definitions either use an analyzer class or describe the
Tokenizer and TokenFilters directly. The Analyzer classes create their
own sequence of Tokenizer and maybe TokenFilters, hard-coded in the
analyzer class. In schema.xml, you will find text types with
Tokenizer/Filter chains, or with just an Analyzer.
Take the Analyzer out of the specification.
On Wed, Jul 25, 2012 at 5:19 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> When I specify analyzer class in schema, something
>> like below and do
>> analysis on this field in analysis page : I cant see
>> verbose output on
>> tokenizer and filters
>>
>> <fieldType name="text_chinese"
>> class="solr.TextField">
>> <analyzer
>> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
>> <tokenizer
>> class="solr.SmartChineseSentenceTokenizerFactory"/>
>> <filter
>> class="solr.SmartChineseWordTokenFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>>
>> *But if i don't add analyzer class, I can see the verbose
>> output based on
>> token and filters applied.*
>
> Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.
>
> It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.
>
>
>
--
Lance Norskog
goksron@gmail.com
Re: Significance of Analyzer Class attribute
Posted by Rajani Maski <ra...@gmail.com>.
Hi All,
Thank you for the replies.
--Regards
Rajani
On Fri, Jul 27, 2012 at 9:58 AM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : > When I specify analyzer class in schema, something
> : > like below and do
> : > analysis on this field in analysis page : I cant see
> : > verbose output on
> : > tokenizer and filters
>
> The reason for that is that if you use an explicit Analyzer
> implimentation, the analysis tool doesn't know what the individual phases
> of hte tokenfilters are -- the Analyzer API doesn't expose that
> information (some Analyzers may be monolithic and not made up of
> individual TokenFilters)
>
>
> : > <fieldType name="text_chinese"
> : > class="solr.TextField">
> : > <analyzer
> : > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
> : > <tokenizer
> ...
>
> : Above config is somehow wrong. You cannot use both analyzer combined
> : with tokenizer and filter altogether. If you want to use lucene analyzer
> : in schema.xml there should be only analyzer definition.
>
> Right. what's happening here is htat since a "class" is specifid for hte
> analyzer, it is ignoring the tokenizer+tokenfilters listed. I've opened a
> bug to add better error checking to catch these kinds of configuration
> mistakes...
>
> https://issues.apache.org/jira/browse/SOLR-3683
>
>
> -Hoss
Re: Significance of Analyzer Class attribute
Posted by Chris Hostetter <ho...@fucit.org>.
: > When I specify analyzer class in schema, something
: > like below and do
: > analysis on this field in analysis page : I cant see
: > verbose output on
: > tokenizer and filters
The reason for that is that if you use an explicit Analyzer
implimentation, the analysis tool doesn't know what the individual phases
of hte tokenfilters are -- the Analyzer API doesn't expose that
information (some Analyzers may be monolithic and not made up of
individual TokenFilters)
: > <fieldType name="text_chinese"
: > class="solr.TextField">
: > <analyzer
: > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
: > <tokenizer
...
: Above config is somehow wrong. You cannot use both analyzer combined
: with tokenizer and filter altogether. If you want to use lucene analyzer
: in schema.xml there should be only analyzer definition.
Right. what's happening here is htat since a "class" is specifid for hte
analyzer, it is ignoring the tokenizer+tokenfilters listed. I've opened a
bug to add better error checking to catch these kinds of configuration
mistakes...
https://issues.apache.org/jira/browse/SOLR-3683
-Hoss
Re: Significance of Analyzer Class attribute
Posted by Ahmet Arslan <io...@yahoo.com>.
> When I specify analyzer class in schema, something
> like below and do
> analysis on this field in analysis page : I cant see
> verbose output on
> tokenizer and filters
>
> <fieldType name="text_chinese"
> class="solr.TextField">
> <analyzer
> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
> <tokenizer
> class="solr.SmartChineseSentenceTokenizerFactory"/>
> <filter
> class="solr.SmartChineseWordTokenFilterFactory"/>
> </analyzer>
> </fieldType>
>
>
> *But if i don't add analyzer class, I can see the verbose
> output based on
> token and filters applied.*
Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.
It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.