You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jochen Lienhard <li...@ub.uni-freiburg.de> on 2013/08/02 10:53:59 UTC

ICUTransformFilterFactory

Hello,

we have a problem with some special characters: for example æ


We are using the ICUTranformFilterFactory for indexing and searching.

We have some documents with "urianae" and with "urianæ"

If I search "urainae" so I find only the versions with "urianae" but not 
the "urianæ"
Only if I search "urainae*" I find both versions.

Is it possible (perhaps by special IDs in the 
ICUTransformFilterFactory), so that I can find all without an asterisk?

Greetings from Germany

Jochen Lienhard

-- 
Dr. rer. nat. Jochen Lienhard
Dezernat EDV

Albert-Ludwigs-Universität Freiburg
Universitätsbibliothek
Rempartstr. 10-16  | Postfach 1629
79098 Freiburg     | 79016 Freiburg

Telefon: +49 761 203-3908
E-Mail: lienhard@ub.uni-freiburg.de
Internet: www.ub.uni-freiburg.de

Re: ICUTransformFilterFactory

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

(13/08/02 17:53), Jochen Lienhard wrote:
> Hello,
>
> we have a problem with some special characters: for example æ
>
>
> We are using the ICUTranformFilterFactory for indexing and searching.
>
> We have some documents with "urianae" and with "urianæ"
>
> If I search "urainae" so I find only the versions with "urianae" but not the "urianæ"
> Only if I search "urainae*" I find both versions.
>
> Is it possible (perhaps by special IDs in the ICUTransformFilterFactory), so that I can find all
> without an asterisk?

Why don't you use MappingCharFilter?

https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
(attached at https://issues.apache.org/jira/browse/SOLR-822 )

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html