You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Woodward <dw...@loc.gov> on 2009/01/21 15:35:18 UTC
Words that need protection from stemming, i.e., protwords.txt
Hi.
Any good protwords.txt out there?
In a fairly standard solr analyzer chain, we use the English Porter analyzer like so:
<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g.,
"maine" is stemmed to "main" - clearly goofing up precision about "Maine" without doing much good for variants of "main".
So - I have an entry for my protwords.txt. What else should go in there?
Thanks for your ideas,
Dave Woodward