You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sa...@accenture.com on 2016/06/30 12:00:23 UTC
RE: Solr 5.3.1 - Synonym is not working as expected

Hi Team,

Hope you are doing good !!

We are using Solr 5.3.1 version as our search engine. This setup is provided by the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to crawl the entire website and index it to the Solr collection. We have configured few fields including text field with Cinese tokenizers. When user search with Chinese characters, we are able to see the relevant results. We wanted to see the same results when user types in English or Pinyin characters. So, we have included synonym file and respective tokenizer added to the schema.xml file. We are not able to get any results after doing these changes. Below is the configuration we did in schema.xml. The synonym file is a mapping of Chinese word with equivalent English and pinyin words.

<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">

  <analyzer>
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms_cn.txt" ignoreCase="true" expand="true"/>

       <filter class="solr.CJKWidthFilterFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.CJKBigramFilterFactory"/>
  </analyzer>

</fieldType>


The output with query debug is providing the below result. The synonym configured for the English word is actually picked, but we see no results. Below is the

"rawquerystring":"nasonex",
    "querystring":"nasonex",
    "parsedquery":"(text:nasonex text:内舒拿)/no_coord",
    "parsedquery_toString":"text:nasonex text:内舒拿",
    "QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1

<ALPHANUM>

1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. Please let us know if this is the proper channel to raise this issue.



Thanks and regards

Santosh Kumar Turangi
MERCK | >Accenture |
Mobile: 08008633009
Email: Santoshkumar.turangi@accenture.com<ma...@accenture.com>


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com