You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Maciej Niemczyk (JIRA)" <ji...@apache.org> on 2013/08/14 17:43:50 UTC

[jira] [Updated] (SOLR-5153) CollationKeyFilter returns unexpected output

     [ https://issues.apache.org/jira/browse/SOLR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maciej Niemczyk updated SOLR-5153:
----------------------------------

    Description: 
Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
	</analyzer>
</fieldType>

<field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>

<copyField source="title" dest="germanText"/>
{code}

Input:
{code}
Peter
{code}

Output:
{code}
WT:  Peter [50 65 74 65 72]
CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
{code}

  was:
Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
the solr analysis reports strange output for the CKF.
Settings:
{code}
<fieldType name="germanText" class="solr.TextField">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
	</analyzer>
</fieldType>

<field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>

<copyField source="title" dest="germanText"/>
{code}

Output:
{code}

WT
text
raw_bytes
start
end
position
type
Peter
[50 65 74 65 72]
0
5
1
word
CKF
text
raw_bytes
position
start
end
type
1䀖瀅䀃᐀
[31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
1
0
5
word
{code}

    
> CollationKeyFilter returns unexpected output
> --------------------------------------------
>
>                 Key: SOLR-5153
>                 URL: https://issues.apache.org/jira/browse/SOLR-5153
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.3
>         Environment: Mac os x
>            Reporter: Maciej Niemczyk
>
> Given the default situation and the example from solr-wiki: http://wiki.apache.org/solr/UnicodeCollation
> the solr analysis reports strange output for the CKF.
> Settings:
> {code}
> <fieldType name="germanText" class="solr.TextField">
> 	<analyzer type="index">
> 		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
> 	</analyzer>
> 	<analyzer type="query">
> 		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 		<filter class="solr.CollationKeyFilterFactory" language="de" strength="primary"/>
> 	</analyzer>
> </fieldType>
> <field name="germanText" type="germanText" indexed="true" stored="false" multiValued="true"/>
> <copyField source="title" dest="germanText"/>
> {code}
> Input:
> {code}
> Peter
> {code}
> Output:
> {code}
> WT:  Peter [50 65 74 65 72]
> CKF: 1䀖瀅䀃᐀ [31 e4 80 96 c e7 80 85 e4 80 83 e1 90 80 0 0 0]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org