You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Goethe <ko...@hotmail.com> on 2007/10/19 15:54:25 UTC

How do I make an accent insensitive search

I've already searched all over the place here in the forum, but I didn't
manage to get this to work.

I want to make the search accent insensitive, for example, I have a file
with the word "resolução", and I want that the guy who types "resolucao"
find that file.

I tried adding |analysis-(pt)| in the nutch-site.xml, the search.jsp has
already the right charset, but nothing.

I appreciate any help, specially one very detailed.
-- 
View this message in context: http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13294989
Sent from the Nutch - User mailing list archive at Nabble.com.


RE: How do I make an accent insensitive search

Posted by Howie Wang <ho...@hotmail.com>.
You can write your own indexing plugin that calls the AccentReplacer.
Just copy the index-basic plugin code. You'll call the AccentReplacer
on any string values before calling doc.add on them. See the Nutch wiki
for more info on plugins.

You could change the webapp to remove accents from queries either 
by hacking search.jsp or by creating a query-filter that removes accents 
from user queries, but I never bothered. If your users are largely 
US/UK based, they almost never enter those accents when querying.

Howie



> Date: Fri, 19 Oct 2007 10:52:24 -0700
> From: kok_warlock@hotmail.com
> To: nutch-user@lucene.apache.org
> Subject: RE: How do I make an accent insensitive search
> 
> 
> Hi, thx for the reply, how exactly do I that? and don't I need to change the
> webapp also?
> 
> 
> 
> Howie Wang wrote:
> > 
> > Don't know if there have been more recent changes to this issue.
> > I did this for Nutch 0.7:
> > 
> > http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200602.mbox/%3cBAY102-F167F537EFF3E57703D0644F3FF0@phx.gbl%3e
> > 
> > Then I changed my indexer plugin to call this class before indexing.
> > 
> > Howie
> > 
> > 
> > 
> >> Date: Fri, 19 Oct 2007 06:54:25 -0700
> >> From: kok_warlock@hotmail.com
> >> To: nutch-user@lucene.apache.org
> >> Subject: How do I make an accent insensitive search
> >> 
> >> 
> >> I've already searched all over the place here in the forum, but I didn't
> >> manage to get this to work.
> >> 
> >> I want to make the search accent insensitive, for example, I have a file
> >> with the word "resolução", and I want that the guy who types "resolucao"
> >> find that file.
> >> 
> >> I tried adding |analysis-(pt)| in the nutch-site.xml, the search.jsp has
> >> already the right charset, but nothing.
> >> 
> >> I appreciate any help, specially one very detailed.
> >> -- 
> >> View this message in context:
> >> http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13294989
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >> 
> > 
> > _________________________________________________________________
> > Climb to the top of the charts!  Play Star Shuffle:  the word scramble
> > challenge with star power.
> > http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct
> > 
> 
> -- 
> View this message in context: http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13299737
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

_________________________________________________________________
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews

RE: How do I make an accent insensitive search

Posted by Goethe <ko...@hotmail.com>.
Hi, thx for the reply, how exactly do I that? and don't I need to change the
webapp also?



Howie Wang wrote:
> 
> Don't know if there have been more recent changes to this issue.
> I did this for Nutch 0.7:
> 
> http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200602.mbox/%3cBAY102-F167F537EFF3E57703D0644F3FF0@phx.gbl%3e
> 
> Then I changed my indexer plugin to call this class before indexing.
> 
> Howie
> 
> 
> 
>> Date: Fri, 19 Oct 2007 06:54:25 -0700
>> From: kok_warlock@hotmail.com
>> To: nutch-user@lucene.apache.org
>> Subject: How do I make an accent insensitive search
>> 
>> 
>> I've already searched all over the place here in the forum, but I didn't
>> manage to get this to work.
>> 
>> I want to make the search accent insensitive, for example, I have a file
>> with the word "resolução", and I want that the guy who types "resolucao"
>> find that file.
>> 
>> I tried adding |analysis-(pt)| in the nutch-site.xml, the search.jsp has
>> already the right charset, but nothing.
>> 
>> I appreciate any help, specially one very detailed.
>> -- 
>> View this message in context:
>> http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13294989
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>> 
> 
> _________________________________________________________________
> Climb to the top of the charts!  Play Star Shuffle:  the word scramble
> challenge with star power.
> http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct
> 

-- 
View this message in context: http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13299737
Sent from the Nutch - User mailing list archive at Nabble.com.


RE: How do I make an accent insensitive search

Posted by Howie Wang <ho...@hotmail.com>.
Don't know if there have been more recent changes to this issue.
I did this for Nutch 0.7:

http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200602.mbox/%3cBAY102-F167F537EFF3E57703D0644F3FF0@phx.gbl%3e

Then I changed my indexer plugin to call this class before indexing.

Howie



> Date: Fri, 19 Oct 2007 06:54:25 -0700
> From: kok_warlock@hotmail.com
> To: nutch-user@lucene.apache.org
> Subject: How do I make an accent insensitive search
> 
> 
> I've already searched all over the place here in the forum, but I didn't
> manage to get this to work.
> 
> I want to make the search accent insensitive, for example, I have a file
> with the word "resolução", and I want that the guy who types "resolucao"
> find that file.
> 
> I tried adding |analysis-(pt)| in the nutch-site.xml, the search.jsp has
> already the right charset, but nothing.
> 
> I appreciate any help, specially one very detailed.
> -- 
> View this message in context: http://www.nabble.com/How-do-I-make-an-accent-insensitive-search-tf4653314.html#a13294989
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

_________________________________________________________________
Climb to the top of the charts!  Play Star Shuffle:  the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct