You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Valerio Schiavoni <va...@gmail.com> on 2006/10/31 17:30:15 UTC

how to handle words with accent?

hello,
i use lucene to index documents in Italian. many terms end with accented
letters: società, fedeltà, etc

What happen now is that if a user search for : societa' (note the a and the
' character), it doesn't get the same results as he would when searching for
società.

What is the best practice to handle such situations ?
i haven't tuned anyhow lucene, and i'm using the default analyzer.

thanks for any suggestions,
valerio
-- 
http://valerioschiavoni.blogspot.com
http://jroller.com/page/vschiavoni

Re: how to handle words with accent?

Posted by Patrick Turcotte <pa...@gmail.com>.
I was referring to
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ISOLatin1AccentFilter.htmlwhich
is in the
2.0 version of Lucene.

Patrick

On 10/31/06, Valerio Schiavoni <va...@gmail.com> wrote:
>
> actually yes, it would be a good result!
> is the class you mentioned the one referred here:
>
> http://sourceforge.net/mailarchive/message.php?msg_id=11811387
>
> (which I had to extract from the sources of the project mentioned in the
> post)
> or you were referring to something else ?
>
> thanks a lot!
>
> On 10/31/06, Patrick Turcotte <pa...@gmail.com> wrote:
> >
> > Should both results be returned in both cases?
> >
> > If so, take a look at the IsoLatin1Filter class, it will remove those
> > accents for indexing and searching if needed.
> >
> > Patrick
> >
> > On 10/31/06, Valerio Schiavoni <va...@gmail.com> wrote:
> > >
> > > hello,
> > > i use lucene to index documents in Italian. many terms end with
> accented
> > > letters: società, fedeltà, etc
> > >
> > > What happen now is that if a user search for : societa' (note the a
> and
> > > the
> > > ' character), it doesn't get the same results as he would when
> searching
> > > for
> > > società.
> > >
> > > What is the best practice to handle such situations ?
> > > i haven't tuned anyhow lucene, and i'm using the default analyzer.
> > >
> > > thanks for any suggestions,
> > > valerio
> > > --
> > > http://valerioschiavoni.blogspot.com
> > > http://jroller.com/page/vschiavoni
> > >
> > >
> >
> >
>
>
> --
> http://valerioschiavoni.blogspot.com
> http://jroller.com/page/vschiavoni
>
>

Re: how to handle words with accent?

Posted by Valerio Schiavoni <va...@gmail.com>.
actually yes, it would be a good result!
is the class you mentioned the one referred here:

http://sourceforge.net/mailarchive/message.php?msg_id=11811387

(which I had to extract from the sources of the project mentioned in the
post)
or you were referring to something else ?

thanks a lot!

On 10/31/06, Patrick Turcotte <pa...@gmail.com> wrote:
>
> Should both results be returned in both cases?
>
> If so, take a look at the IsoLatin1Filter class, it will remove those
> accents for indexing and searching if needed.
>
> Patrick
>
> On 10/31/06, Valerio Schiavoni <va...@gmail.com> wrote:
> >
> > hello,
> > i use lucene to index documents in Italian. many terms end with accented
> > letters: società, fedeltà, etc
> >
> > What happen now is that if a user search for : societa' (note the a and
> > the
> > ' character), it doesn't get the same results as he would when searching
> > for
> > società.
> >
> > What is the best practice to handle such situations ?
> > i haven't tuned anyhow lucene, and i'm using the default analyzer.
> >
> > thanks for any suggestions,
> > valerio
> > --
> > http://valerioschiavoni.blogspot.com
> > http://jroller.com/page/vschiavoni
> >
> >
>
>


-- 
http://valerioschiavoni.blogspot.com
http://jroller.com/page/vschiavoni

Re: how to handle words with accent?

Posted by Patrick Turcotte <pa...@gmail.com>.
Should both results be returned in both cases?

If so, take a look at the IsoLatin1Filter class, it will remove those
accents for indexing and searching if needed.

Patrick

On 10/31/06, Valerio Schiavoni <va...@gmail.com> wrote:
>
> hello,
> i use lucene to index documents in Italian. many terms end with accented
> letters: società, fedeltà, etc
>
> What happen now is that if a user search for : societa' (note the a and
> the
> ' character), it doesn't get the same results as he would when searching
> for
> società.
>
> What is the best practice to handle such situations ?
> i haven't tuned anyhow lucene, and i'm using the default analyzer.
>
> thanks for any suggestions,
> valerio
> --
> http://valerioschiavoni.blogspot.com
> http://jroller.com/page/vschiavoni
>
>