You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Aa...@globaldatapoint.com on 2008/07/16 10:58:48 UTC
Accent Insensitive Search
Hi All,
I need to implememnt Accent Insensitive Searchin my application.
Simple example is Kraków search should also bring Krakow also in search results.
I have seen many threads discussing sloution with SOLR, But I dont want to use SOLR in my application for only this feature.
Any suggessionas?
Regards,
Aamir Yaseen
Re: Accent Insensitive Search
Posted by Wojtek H <wo...@gmail.com>.
Note that ISOLatin1AccentFilter converts accent characters only from
ISO-8859-1 character set. Which means that if you need to convert
accents of eastern European languages you need to write your own
accent filter.
wojtek
2008/7/16 Petite Abeille <pe...@mac.com>:
>
> On Jul 16, 2008, at 10:58 AM, Aamir.Yaseen@globaldatapoint.com wrote:
>
>> Simple example is Kraków search should also bring Krakow also in search
>> results.
>
> As pointed out previously, you need to transliterate your input using
> something like ISOLatinFilter or such.
>
> For example, searching for 'aaiun' should return 'Aaiún' and vis-versa:
>
> http://svr225.stepx.com:3388/search?q=aaiun
> http://svr225.stepx.com:3388/el-aaiun
>
> Sean M. Burke's Unidecode provides an extensive transliteration of Unicode
> into ASCII:
>
> http://interglacial.com/~sburke/tpj/as_html/tpj22.html
>
> E.g.:
>
> Москва́ Moskva
> 北京 beijing
> Ἀθηνᾶ Athena
> 서울 seoul
> 東京 dongjing
> 京都市 jingdushi
> नेपाल nepaal
>
> Cheers,
>
> --
> PA.
> http://alt.textdrive.com/nanoki/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Accent Insensitive Search
Posted by Petite Abeille <pe...@mac.com>.
On Jul 16, 2008, at 10:58 AM, Aamir.Yaseen@globaldatapoint.com wrote:
> Simple example is Kraków search should also bring Krakow also in
> search results.
As pointed out previously, you need to transliterate your input using
something like ISOLatinFilter or such.
For example, searching for 'aaiun' should return 'Aaiún' and vis-versa:
http://svr225.stepx.com:3388/search?q=aaiun
http://svr225.stepx.com:3388/el-aaiun
Sean M. Burke's Unidecode provides an extensive transliteration of
Unicode into ASCII:
http://interglacial.com/~sburke/tpj/as_html/tpj22.html
E.g.:
Москва́ Moskva
北京 beijing
Ἀθηνᾶ Athena
서울 seoul
東京 dongjing
京都市 jingdushi
नेपाल nepaal
Cheers,
--
PA.
http://alt.textdrive.com/nanoki/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org