You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Renaud Paquay <rp...@isis.be> on 2006/12/22 10:54:06 UTC

French stemmer problem

Hello,

Does anyone know about a modified version of the French Stemmer ?
This one has too many bad results.
For example, if I use the word : "ours" (bear)
The stemmer stemm it into "our".....which doesn't exist in French.
If I have some words like "L'insepecteur" the index process using the
stemmer doesn't work correctly

So the problem is that the results is not accurate

Someone could help ?

Thanks,

Renaud Paquay
Developer and Network Manager
ISIS SA
Rue des Deportes 120
B-4800 Verviers
Tel: +32-(0)87.23.06.90
Fax: +32-(0)87.23.06.54
email : rp@isis.be
url : www.isis.be / www.4dbenelux.be



Re: French stemmer problem

Posted by Patrick Turcotte <pa...@gmail.com>.
Hi Renaud,

Maybe you should take a look at the Morphalou project (
http://actarus.atilf.fr/lexiques/morphalou/) it is a database of lemma and
forms in French.

You could extract the data and create a synonym index or something.

Don't hesitate to contact me off list (and in French if needed) for more
info.

Patrick

On 12/22/06, Mark Miller <ma...@gmail.com> wrote:
>
> Non of the stemmers always stem to a valid word. It is not important as
> you should be stemming the query as well. The only thing that is
> important is that each word always stems to the same base. Many English
> words do not stem to real English words with the English stemmer either.
>
> Renaud Paquay wrote:
> > Hello,
> >
> > Does anyone know about a modified version of the French Stemmer ?
> > This one has too many bad results.
> > For example, if I use the word : "ours" (bear)
> > The stemmer stemm it into "our".....which doesn't exist in French.
> > If I have some words like "L'insepecteur" the index process using the
> > stemmer doesn't work correctly
> >
> > So the problem is that the results is not accurate
> >
> > Someone could help ?
> >
> > Thanks,
> >
> > Renaud Paquay
> > Developer and Network Manager
> > ISIS SA
> > Rue des Deportes 120
> > B-4800 Verviers
> > Tel: +32-(0)87.23.06.90
> > Fax: +32-(0)87.23.06.54
> > email : rp@isis.be
> > url : www.isis.be / www.4dbenelux.be
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: French stemmer problem

Posted by Mark Miller <ma...@gmail.com>.
Non of the stemmers always stem to a valid word. It is not important as 
you should be stemming the query as well. The only thing that is 
important is that each word always stems to the same base. Many English 
words do not stem to real English words with the English stemmer either.

Renaud Paquay wrote:
> Hello,
>
> Does anyone know about a modified version of the French Stemmer ?
> This one has too many bad results.
> For example, if I use the word : "ours" (bear)
> The stemmer stemm it into "our".....which doesn't exist in French.
> If I have some words like "L'insepecteur" the index process using the
> stemmer doesn't work correctly
>
> So the problem is that the results is not accurate
>
> Someone could help ?
>
> Thanks,
>
> Renaud Paquay
> Developer and Network Manager
> ISIS SA
> Rue des Deportes 120
> B-4800 Verviers
> Tel: +32-(0)87.23.06.90
> Fax: +32-(0)87.23.06.54
> email : rp@isis.be
> url : www.isis.be / www.4dbenelux.be
>
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: French stemmer problem

Posted by Samir Abdou <Sa...@unine.ch>.
Hi,

Take a look to http://www.unine.ch/info/clef where you'll find valuable
resources for many languages including French. 

Samir   


-----Message d'origine-----
De : Renaud Paquay [mailto:rp@isis.be] 
Envoyé : vendredi, 22. décembre 2006 10:54
À : java-user@lucene.apache.org
Objet : French stemmer problem

Hello,

Does anyone know about a modified version of the French Stemmer ?
This one has too many bad results.
For example, if I use the word : "ours" (bear)
The stemmer stemm it into "our".....which doesn't exist in French.
If I have some words like "L'insepecteur" the index process using the
stemmer doesn't work correctly

So the problem is that the results is not accurate

Someone could help ?

Thanks,

Renaud Paquay
Developer and Network Manager
ISIS SA
Rue des Deportes 120
B-4800 Verviers
Tel: +32-(0)87.23.06.90
Fax: +32-(0)87.23.06.54
email : rp@isis.be
url : www.isis.be / www.4dbenelux.be




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org