You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Gavin (JIRA)" <ji...@apache.org> on 2015/10/12 15:33:05 UTC
[jira] [Moved] (SOLR-8159) Tokenizing Chinese strings using lucene
Chinese analyzer
[ https://issues.apache.org/jira/browse/SOLR-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gavin moved INFRA-10577 to SOLR-8159:
-------------------------------------
INFRA-Members: (was: [infrastructure-team])
Workflow: classic default workflow (was: INFRA Workflow)
Key: SOLR-8159 (was: INFRA-10577)
Project: Solr (was: Infrastructure)
> Tokenizing Chinese strings using lucene Chinese analyzer
> --------------------------------------------------------
>
> Key: SOLR-8159
> URL: https://issues.apache.org/jira/browse/SOLR-8159
> Project: Solr
> Issue Type: Bug
> Reporter: Srimanth Bangalore Krishnamurthy
> Priority: Minor
>
> The text that is indexed: 校准的卡尔曼滤波器
> Query string: 卡尔曼滤波
> The exact query string is present in an indexed document on SOLR. But it doesn't return this document.
> SOLR analysis shows on index:
> 的卡
> 尔
> 曼
> 滤波器
> but the queried terms show:
> 卡
> 尔
> 曼
> 滤波
> The other characters appear to be influencing how 卡尔曼滤波 is tokenized.
> Is this an expected behavior??
> Here are the things I have tried
> 1) I tried a couple of different tokenizers and the behavior is the same.
> 2) I tried to explore the option of dictionary but I found this:
> https://issues.apache.org/jira/browse/LUCENE-1817
> 3) I tried using the following with text_zh for chinese documents.
> a) solr.KeywordMarkerFilterFactory
> b) solr.StemmerOverrideFilterFactory
> c) Adding to synonyms.txt
> All these seem to work only with text_en and have no effect for text_zh
> Are there any options I can try to make sure that the query returns this document?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org