You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rajani Maski <ra...@gmail.com> on 2012/07/25 10:15:59 UTC

Significance of Analyzer Class attribute

Hi,  What is the significance of Analyzer  class  attribute?


When I specify analyzer class in schema,  something like below and do
analysis on this field in analysis page : I cant  see verbose output on
tokenizer and filters

<fieldType name="text_chinese" class="solr.TextField">
      <analyzer
class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
  <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
  <filter class="solr.SmartChineseWordTokenFilterFactory"/>
  </analyzer>
    </fieldType>


*But if i don't add analyzer class, I can see the verbose output based on
token and filters applied.*

<fieldType name="text_chinese" class="solr.TextField">
      <analyzer>
  <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
  <filter class="solr.SmartChineseWordTokenFilterFactory"/>
  </analyzer>
    </fieldType>

Why is it that I cant see for above case? What happens when I specify
Analyzer class?  Does it take any default if i do not mention class
attribute in analyzer tag?



Thanks & Regards
Rajani

Re: Significance of Analyzer Class attribute

Posted by Lance Norskog <go...@gmail.com>.

An Analyzer object is a chain of Tokenizer and TokenFilters. These
text type definitions either use an analyzer class or describe the
Tokenizer and TokenFilters directly. The Analyzer classes create their
own sequence of Tokenizer and maybe TokenFilters, hard-coded in the
analyzer class. In schema.xml, you will find text types with
Tokenizer/Filter chains, or with just an Analyzer.

Take the Analyzer out of the specification.

On Wed, Jul 25, 2012 at 5:19 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> When I specify analyzer class in schema,  something
>> like below and do
>> analysis on this field in analysis page : I cant  see
>> verbose output on
>> tokenizer and filters
>>
>> <fieldType name="text_chinese"
>> class="solr.TextField">
>>       <analyzer
>> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
>>   <tokenizer
>> class="solr.SmartChineseSentenceTokenizerFactory"/>
>>   <filter
>> class="solr.SmartChineseWordTokenFilterFactory"/>
>>   </analyzer>
>>     </fieldType>
>>
>>
>> *But if i don't add analyzer class, I can see the verbose
>> output based on
>> token and filters applied.*
>
> Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.
>
> It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.
>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Significance of Analyzer Class attribute

Posted by Rajani Maski <ra...@gmail.com>.

Hi All,

  Thank you for the replies.



--Regards
Rajani


On Fri, Jul 27, 2012 at 9:58 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : > When I specify analyzer class in schema,  something
> : > like below and do
> : > analysis on this field in analysis page : I cant  see
> : > verbose output on
> : > tokenizer and filters
>
> The reason for that is that if you use an explicit Analyzer
> implimentation, the analysis tool doesn't know what the individual phases
> of hte tokenfilters are -- the Analyzer API doesn't expose that
> information (some Analyzers may be monolithic and not made up of
> individual TokenFilters)
>
>
>  : > <fieldType name="text_chinese"
> : > class="solr.TextField">
> : >       <analyzer
> : > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
> : >   <tokenizer
>         ...
>
> : Above config is somehow wrong. You cannot use both analyzer combined
> : with tokenizer and filter altogether. If you want to use lucene analyzer
> : in schema.xml there should be only analyzer definition.
>
> Right.  what's happening here is htat since a "class" is specifid for hte
> analyzer, it is ignoring the tokenizer+tokenfilters listed.  I've opened a
> bug to add better error checking to catch these kinds of configuration
> mistakes...
>
> https://issues.apache.org/jira/browse/SOLR-3683
>
>
> -Hoss

Re: Significance of Analyzer Class attribute

Posted by Chris Hostetter <ho...@fucit.org>.

: > When I specify analyzer class in schema,  something
: > like below and do
: > analysis on this field in analysis page : I cant  see
: > verbose output on
: > tokenizer and filters

The reason for that is that if you use an explicit Analyzer 
implimentation, the analysis tool doesn't know what the individual phases 
of hte tokenfilters are -- the Analyzer API doesn't expose that 
information (some Analyzers may be monolithic and not made up of 
individual TokenFilters)


 : > <fieldType name="text_chinese"
: > class="solr.TextField">
: >       <analyzer
: > class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
: >   <tokenizer
	...
 
: Above config is somehow wrong. You cannot use both analyzer combined 
: with tokenizer and filter altogether. If you want to use lucene analyzer 
: in schema.xml there should be only analyzer definition.

Right.  what's happening here is htat since a "class" is specifid for hte 
analyzer, it is ignoring the tokenizer+tokenfilters listed.  I've opened a 
bug to add better error checking to catch these kinds of configuration 
mistakes...

https://issues.apache.org/jira/browse/SOLR-3683


-Hoss

Re: Significance of Analyzer Class attribute

Posted by Ahmet Arslan <io...@yahoo.com>.

> When I specify analyzer class in schema,  something
> like below and do
> analysis on this field in analysis page : I cant  see
> verbose output on
> tokenizer and filters
> 
> <fieldType name="text_chinese"
> class="solr.TextField">
>       <analyzer
> class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
>   <tokenizer
> class="solr.SmartChineseSentenceTokenizerFactory"/>
>   <filter
> class="solr.SmartChineseWordTokenFilterFactory"/>
>   </analyzer>
>     </fieldType>
> 
> 
> *But if i don't add analyzer class, I can see the verbose
> output based on
> token and filters applied.*

Above config is somehow wrong. You cannot use both analyzer combined with tokenizer and filter altogether. If you want to use lucene analyzer in schema.xml there should be only analyzer definition.

It is highly recommended to use solr's charFilter(s), tokenizer, tokenFilter(s) in schema.xml.