You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruno Mannina <bm...@free.fr> on 2021/01/10 16:56:44 UTC
[solr8.7] not relevant results for chinese query
Hello,
I try to use chinese language with my index.
My definition is:
<field name="tizh" type="text_zh" multiValued="true" indexed="true"
stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
<!-- Simplified chinese -->
<!-- BRUNO -->
<fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.HMMChineseTokenizerFactory"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory"
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
But, I get too much not relevant results.
i.e. : With the query (phone case):
tizh:(手機殼)
my query is translate to:
tizh:(手 OR 機 OR 殼)
But:
tizh:(手 AND 機 AND 殼)
returns 0 result.
And:
tizh:”手機殼”
returns also 0 result.
Is it possible to improve my fieldType ? or must I add something else ?
Thanks,
Bruno
--
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel antivirus Avast.
https://www.avast.com/antivirus
RE: [solr8.7] not relevant results for chinese query
Posted by Bruno Mannina <bm...@free.fr>.
Hi,
With this article ( https://opensourceconnections.com/blog/2011/12/23/indexing-chinese-in-solr/ ), I begin to understand what happens.
Is someone have already try, with a recent SOLR, the Poading algorithm?
Thanks,
Bruno
-----Message d'origine-----
De : Bruno Mannina [mailto:bmannina@free.fr]
Envoyé : dimanche 10 janvier 2021 17:57
À : solr-user@lucene.apache.org
Objet : [solr8.7] not relevant results for chinese query
Hello,
I try to use chinese language with my index.
My definition is:
<field name="tizh" type="text_zh" multiValued="true" indexed="true"
stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
<!-- Simplified chinese -->
<!-- BRUNO -->
<fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.HMMChineseTokenizerFactory"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory"
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
But, I get too much not relevant results.
i.e. : With the query (phone case):
tizh:(手機殼)
my query is translate to:
tizh:(手 OR 機 OR 殼)
But:
tizh:(手 AND 機 AND 殼)
returns 0 result.
And:
tizh:”手機殼”
returns also 0 result.
Is it possible to improve my fieldType ? or must I add something else ?
Thanks,
Bruno
--
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel antivirus Avast.
https://www.avast.com/antivirus
--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus