You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by soundarya <so...@gmail.com> on 2016/10/25 09:24:01 UTC

Solr 5.3.1 - Synonym is not working as expected

We are using Solr 5.3.1 version as our search engine. This setup is provided
by 
the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to
crawl 
the entire website and index it to the Solr collection. We have configured
few 
fields including text field with Cinese tokenizers. When user search with 
Chinese characters, we are able to see the relevant results. We wanted to
see 
the same results when user types in English or Pinyin characters. So, we
have 
included synonym file and respective tokenizer added to the schema.xml file.
We 
are not able to get any results after doing these changes. Below is the 
configuration we did in schema.xml. The synonym file is a mapping of Chinese 
word with equivalent English and pinyin words.

<fieldType name="text_cjk" class="solr.TextField"
positionIncrementGap="100">

<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_cn.txt" 
ignoreCase="true" expand="true"/>

<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory"/>
</analyzer>

</fieldType>


The output with query debug is providing the below result. The synonym 
configured for the English word is actually picked, but we see no results. 
Below is the

"rawquerystring":"nasonex",
"querystring":"nasonex",
"parsedquery":"(text:nasonex text:内舒拿)/no_coord",
"parsedquery_toString":"text:nasonex text:内舒拿",
"QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. 




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-3-1-Synonym-is-not-working-as-expected-tp4302913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.3.1 - Synonym is not working as expected

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,

If your index is pure Chinese, I would do the expansion on query time only.
Simply replace English query term with Chinese translations.

Ahmet



On Tuesday, October 25, 2016 12:30 PM, soundarya <so...@gmail.com> wrote:
We are using Solr 5.3.1 version as our search engine. This setup is provided
by 
the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to
crawl 
the entire website and index it to the Solr collection. We have configured
few 
fields including text field with Cinese tokenizers. When user search with 
Chinese characters, we are able to see the relevant results. We wanted to
see 
the same results when user types in English or Pinyin characters. So, we
have 
included synonym file and respective tokenizer added to the schema.xml file.
We 
are not able to get any results after doing these changes. Below is the 
configuration we did in schema.xml. The synonym file is a mapping of Chinese 
word with equivalent English and pinyin words.

<fieldType name="text_cjk" class="solr.TextField"
positionIncrementGap="100">

<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_cn.txt" 
ignoreCase="true" expand="true"/>

<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory"/>
</analyzer>

</fieldType>


The output with query debug is providing the below result. The synonym 
configured for the English word is actually picked, but we see no results. 
Below is the

"rawquerystring":"nasonex",
"querystring":"nasonex",
"parsedquery":"(text:nasonex text:内舒拿)/no_coord",
"parsedquery_toString":"text:nasonex text:内舒拿",
"QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. 




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-3-1-Synonym-is-not-working-as-expected-tp4302913.html
Sent from the Solr - User mailing list archive at Nabble.com.