You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Aa...@globaldatapoint.com on 2008/07/16 10:58:48 UTC

Accent Insensitive Search

Hi All,

I need to implememnt Accent Insensitive Searchin my application.

 

Simple example is Kraków search should also bring Krakow also in search results.

 

I have seen many threads discussing sloution with SOLR, But I dont want to use SOLR in my application for only this feature.

 

Any suggessionas?

 

 

Regards,

Aamir Yaseen


Re: Accent Insensitive Search

Posted by Wojtek H <wo...@gmail.com>.
Note that ISOLatin1AccentFilter converts accent characters only from
ISO-8859-1 character set. Which means that if you need to convert
accents of eastern European languages you need to write your own
accent filter.
wojtek

2008/7/16 Petite Abeille <pe...@mac.com>:
>
> On Jul 16, 2008, at 10:58 AM, Aamir.Yaseen@globaldatapoint.com wrote:
>
>> Simple example is Kraków search should also bring Krakow also in search
>> results.
>
> As pointed out previously, you need to transliterate your input using
> something like ISOLatinFilter or such.
>
> For example, searching for 'aaiun' should return 'Aaiún' and vis-versa:
>
> http://svr225.stepx.com:3388/search?q=aaiun
> http://svr225.stepx.com:3388/el-aaiun
>
> Sean M. Burke's Unidecode provides an extensive transliteration of Unicode
> into ASCII:
>
> http://interglacial.com/~sburke/tpj/as_html/tpj22.html
>
> E.g.:
>
> Москва́ Moskva
> 北京      beijing
> Ἀθηνᾶ   Athena
> 서울      seoul
> 東京      dongjing
> 京都市     jingdushi
> नेपाल   nepaal
>
> Cheers,
>
> --
> PA.
> http://alt.textdrive.com/nanoki/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Accent Insensitive Search

Posted by Petite Abeille <pe...@mac.com>.
On Jul 16, 2008, at 10:58 AM, Aamir.Yaseen@globaldatapoint.com wrote:

> Simple example is Kraków search should also bring Krakow also in  
> search results.

As pointed out previously, you need to transliterate your input using  
something like ISOLatinFilter or such.

For example, searching for 'aaiun' should return 'Aaiún' and vis-versa:

http://svr225.stepx.com:3388/search?q=aaiun
http://svr225.stepx.com:3388/el-aaiun

Sean M. Burke's Unidecode provides an extensive transliteration of  
Unicode into ASCII:

http://interglacial.com/~sburke/tpj/as_html/tpj22.html

E.g.:

Москва́	Moskva
北京	beijing
Ἀθηνᾶ	Athena
서울	seoul
東京	dongjing
京都市	jingdushi
नेपाल	nepaal

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org