You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stéphane Tellier <st...@cgi.com> on 2009/04/06 17:06:53 UTC

Stemming and ISO Latin Accent filters together

Hi,

    we're trying to apply the French Stemmer filter with the ISO Latin
Accent filter for our index, but unfortunately, we're having some bad
behaviors for some searches. After many tries, I've found out that the
French Stemmer (or Snowball with language = "french") seems to be too
sensitive to accents  : for example, we have a couple of documents with the
word "publiée". Normally, if I search for "publiée" or "publié" or "publiee"
or "publie", all this should be equivalent and returns the same results. But
in that case, "publie" and "publiee" does not work at all. I've tried the
same words after deactivating the stemming and then re-index, and
effectively, the results were good.
I've also try to change the order of the filters in the schema, but
unfortunately, it brings other kind of problems.
I know that this should be more a question for the Lucene community, but I'm
just curious if someone using Solr and working with such language seems to
encounter the same behave and has someway found a trick to fix the problem
by, for example, using another filter or using the protword list feature of
Snowball.

Thanks.
-- 
View this message in context: http://www.nabble.com/Stemming-and-ISO-Latin-Accent-filters-together-tp22910690p22910690.html
Sent from the Solr - User mailing list archive at Nabble.com.