You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Himanshu Jindal <hi...@gmail.com> on 2012/08/03 18:57:06 UTC

Using Solr-319 with Solr 3.6.0

Hi,

I am trying to implement a solr based search engine for japanese language.
I am having trouble adding synonym supprt for japanese language.
I am using text_ja for my indexed text and I have the following entry in
schema.xml for it.

<fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="false">
 <analyzer type="index"><tokenizer class="solr.JapaneseTokenizerFactory"
mode="search"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_ja.txt"
ignoreCase="true" expand="true"
tokenFactory="solr.JapaneseTokenizerFactory" randomAttribute="randomValue"/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
<filter class="solr.JapanesePartOfSpeechStopFilterFactory"
tags="lang/stoptags_ja.txt" enablePositionIncrements="true"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ja.txt" enablePositionIncrements="true"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
<filter class="solr.LowerCaseFilterFactory"/></analyzer>
<analyzer type="query"><tokenizer class="solr.JapaneseTokenizerFactory"
mode="search"/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
<filter class="solr.JapanesePartOfSpeechStopFilterFactory"
tags="lang/stoptags_ja.txt" enablePositionIncrements="true"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ja.txt" enablePositionIncrements="true"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Here is my synonym.txt

国民経済計算, 国内総生産

I verified the issue through solr analysis webpage and giving type
"text_ja" and the text in the index box as 国民経済計算 and the query box as
国内総生産. Ideally, the synonym filter should apply the synonym at index level.
However, it does not. The reason for this is that the synonyms are not
tokenized even after specifying the tokenFactory along with the
synonymfilter in schema.xml. I verified this by changing the synonym file
to 国民, 国内. Now, when I specify the text as 国民 and query as 国内, I get a
match because, the tokenizer does not tokenize the text and the synonyms
match exactly.

I am suing Solr 3.6 and the solr-319 was resolved in 2008 and should have
been a part of 3.6.0.
Is there any reason why solr-319 is not at work on my solr? Do I have to
apply the patch or is there some setting that I can change?

Thank you so much for your time and cooperation
Himanshu Jindal

Re: Using Solr-319 with Solr 3.6.0

Posted by Robert Muir <rc...@gmail.com>.

On Fri, Aug 3, 2012 at 12:57 PM, Himanshu Jindal
<hi...@gmail.com> wrote:
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms_ja.txt"
> ignoreCase="true" expand="true"
> tokenFactory="solr.JapaneseTokenizerFactory" randomAttribute="randomValue"/>

I think you have a typo here, it should be tokenizerFactory, not tokenFactory

-- 
lucidimagination.com