You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/05 16:24:08 UTC
[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by RobertMuir
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "AnalyzersTokenizersTokenFilters" page has been changed by RobertMuir.
The comment on this change is: beef up / disambiguate the snowball docs.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=73&rev2=74
--------------------------------------------------
Example: "riding", "rides", "horses" ==> "ride", "ride", "hors".
+ Note: This differs very slightly from the "Porter" algorithm available in `solr.SnowballPorterFilter`, in that it deviates slightly from the published algorithm.
+ For more details, see the section "Points of difference from the published algorithm" described [[http://tartarus.org/~martin/PorterStemmer/|here]].
+
<<Anchor(EnglishPorterFilter)>>
==== solr.EnglishPorterFilterFactory ====
@@ -347, +350 @@
Creates `org.apache.lucene.analysis.SnowballPorterFilter`.
- Creates an [[http://snowball.tartarus.org/algorithms/english/stemmer.html|Porter2 stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification. The language attribute is used to specify the language of the stemmer.
+ Creates an [[http://snowball.tartarus.org/texts/stemmersoverview.html|Snowball stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification. The language attribute is used to specify the language of the stemmer.
{{{
<fieldtype name="myfieldtype" class="solr.TextField">
<analyzer>
@@ -358, +361 @@
}}}
Valid values for the language attribute (creates the snowball stemmer class language + "Stemmer"):
- * Danish
- * Dutch
- * English
- * Finnish
- * French
- * German2
- * German
- * Italian
- * Kp
- * Lovins
- * Norwegian
- * Porter
- * Portuguese
- * Russian
- * Spanish
- * Swedish
+ * [[http://snowball.tartarus.org/algorithms/danish/stemmer.html|Danish]]
+ * [[http://snowball.tartarus.org/algorithms/dutch/stemmer.html|Dutch]]
+ * [[http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html|Kp]]: The Kraaij-Pohlmann stemming algorithm for Dutch.
+ * [[http://snowball.tartarus.org/algorithms/porter/stemmer.html|Porter]]: The original Porter stemming algorithm for English.
+ * [[http://snowball.tartarus.org/algorithms/english/stemmer.html|English]]: The Porter2 stemming algorithm for English.
+ * [[http://snowball.tartarus.org/algorithms/lovins/stemmer.html|Lovins]]: The early Lovins stemming algorithm for English.
+ * [[http://snowball.tartarus.org/algorithms/finnish/stemmer.html|Finnish]]
+ * [[http://snowball.tartarus.org/algorithms/french/stemmer.html|French]]
+ * [[http://snowball.tartarus.org/algorithms/german/stemmer.html|German]]
+ * [[http://snowball.tartarus.org/algorithms/german2/stemmer.html|German2]]: A variation of the German algorithm with handling to allow ä, ö and ü to be represented by ae, oe and ue
+ * [[http://snowball.tartarus.org/algorithms/italian/stemmer.html|Italian]]
+ * [[http://snowball.tartarus.org/algorithms/norwegian/stemmer.html|Norwegian]]
+ * [[http://snowball.tartarus.org/algorithms/portuguese/stemmer.html|Portuguese]]
+ * [[http://snowball.tartarus.org/algorithms/russian/stemmer.html|Russian]]
+ * [[http://snowball.tartarus.org/algorithms/spanish/stemmer.html|Spanish]]
+ * [[http://snowball.tartarus.org/algorithms/swedish/stemmer.html|Swedish]]
+ <!> Gotchas:
+ * Although the Lovins stemmer is described as faster than Porter/Porter2, practically it is much slower in Solr, as it is implemented using reflection.
+ * Neither the Lovins nor the Finnish stemmer produce correct output (as of Solr 1.4), due to a [[http://article.gmane.org/gmane.comp.search.snowball/1139|known bug in Snowball]]
<<Anchor(WordDelimiterFilter)>>
==== solr.WordDelimiterFilterFactory ====