You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jochen Lienhard <li...@ub.uni-freiburg.de> on 2013/08/02 10:53:59 UTC
ICUTransformFilterFactory
Hello,
we have a problem with some special characters: for example æ
We are using the ICUTranformFilterFactory for indexing and searching.
We have some documents with "urianae" and with "urianæ"
If I search "urainae" so I find only the versions with "urianae" but not
the "urianæ"
Only if I search "urainae*" I find both versions.
Is it possible (perhaps by special IDs in the
ICUTransformFilterFactory), so that I can find all without an asterisk?
Greetings from Germany
Jochen Lienhard
--
Dr. rer. nat. Jochen Lienhard
Dezernat EDV
Albert-Ludwigs-Universität Freiburg
Universitätsbibliothek
Rempartstr. 10-16 | Postfach 1629
79098 Freiburg | 79016 Freiburg
Telefon: +49 761 203-3908
E-Mail: lienhard@ub.uni-freiburg.de
Internet: www.ub.uni-freiburg.de
Re: ICUTransformFilterFactory
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(13/08/02 17:53), Jochen Lienhard wrote:
> Hello,
>
> we have a problem with some special characters: for example æ
>
>
> We are using the ICUTranformFilterFactory for indexing and searching.
>
> We have some documents with "urianae" and with "urianæ"
>
> If I search "urainae" so I find only the versions with "urianae" but not the "urianæ"
> Only if I search "urainae*" I find both versions.
>
> Is it possible (perhaps by special IDs in the ICUTransformFilterFactory), so that I can find all
> without an asterisk?
Why don't you use MappingCharFilter?
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
(attached at https://issues.apache.org/jira/browse/SOLR-822 )
koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html