You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Maciej Niemczyk (JIRA)" <ji...@apache.org> on 2013/08/14 17:43:50 UTC
[jira] [Updated] (SOLR-5153) CollationKeyFilter returns unexpected
output
[ https://issues.apache.org/jira/browse/SOLR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Niemczyk updated SOLR-5153:
----------------------------------
Description:
Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
</analyzer>
</fieldType>
<field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>
<copyField source="title" dest="germanText"/>
{code}
Input:
{code}
Peter
{code}
Output:
{code}
WT: Peter [50 65 74 65 72]
CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
{code}
was:
Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
</analyzer>
</fieldType>
<field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>
<copyField source="title" dest="germanText"/>
{code}
Output:
{code}
WT
text
raw_bytes
start
end
position
type
Peter
[50 65 74 65 72]
0
5
1
word
CKF
text
raw_bytes
position
start
end
type
1䀖瀅䀃᐀
[31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
1
0
5
word
{code}
> CollationKeyFilter returns unexpected output
> --------------------------------------------
>
> Key: SOLR-5153
> URL: https://issues.apache.org/jira/browse/SOLR-5153
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.3
> Environment: Mac os x
> Reporter: Maciej Niemczyk
>
> Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
> the solr analysis reports strange output for the CKF.
> Settings:
> {code}
> <fieldType name="germanText" class="solr.TextField">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
> </analyzer>
> </fieldType>
> <field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>
> <copyField source="title" dest="germanText"/>
> {code}
> Input:
> {code}
> Peter
> {code}
> Output:
> {code}
> WT: Peter [50 65 74 65 72]
> CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org