You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/05 16:24:08 UTC

[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by RobertMuir

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by RobertMuir.
The comment on this change is: beef up / disambiguate the snowball docs.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=73&rev2=74

--------------------------------------------------

  
    Example: "riding", "rides", "horses" ==> "ride", "ride", "hors".
  
+ Note: This differs very slightly from the "Porter" algorithm available in `solr.SnowballPorterFilter`, in that it deviates slightly from the published algorithm.
+ For more details, see the section "Points of difference from the published algorithm" described [[http://tartarus.org/~martin/PorterStemmer/|here]].
+ 
  <<Anchor(EnglishPorterFilter)>>
  ==== solr.EnglishPorterFilterFactory ====
  
@@ -347, +350 @@

  
  Creates `org.apache.lucene.analysis.SnowballPorterFilter`.
  
- Creates an [[http://snowball.tartarus.org/algorithms/english/stemmer.html|Porter2 stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification.  The language attribute is used to specify the language of the stemmer.
+ Creates an [[http://snowball.tartarus.org/texts/stemmersoverview.html|Snowball stemmer]] from the Java classes generated from a [[http://snowball.tartarus.org/|Snowball]] specification.  The language attribute is used to specify the language of the stemmer.
  {{{
  <fieldtype name="myfieldtype" class="solr.TextField">
    <analyzer>
@@ -358, +361 @@

  }}}
  
  Valid values for the language attribute (creates the snowball stemmer class language + "Stemmer"):
-  * Danish
-  * Dutch
-  * English
-  * Finnish
-  * French
-  * German2
-  * German
-  * Italian
-  * Kp
-  * Lovins
-  * Norwegian
-  * Porter
-  * Portuguese
-  * Russian
-  * Spanish
-  * Swedish
+  * [[http://snowball.tartarus.org/algorithms/danish/stemmer.html|Danish]]
+  * [[http://snowball.tartarus.org/algorithms/dutch/stemmer.html|Dutch]]
+  * [[http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html|Kp]]: The Kraaij-Pohlmann stemming algorithm for Dutch.
+  * [[http://snowball.tartarus.org/algorithms/porter/stemmer.html|Porter]]: The original Porter stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/english/stemmer.html|English]]: The Porter2 stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/lovins/stemmer.html|Lovins]]: The early Lovins stemming algorithm for English.
+  * [[http://snowball.tartarus.org/algorithms/finnish/stemmer.html|Finnish]]
+  * [[http://snowball.tartarus.org/algorithms/french/stemmer.html|French]]
+  * [[http://snowball.tartarus.org/algorithms/german/stemmer.html|German]]
+  * [[http://snowball.tartarus.org/algorithms/german2/stemmer.html|German2]]: A variation of the German algorithm with handling to allow ä, ö and ü to be represented by ae, oe and ue
+  * [[http://snowball.tartarus.org/algorithms/italian/stemmer.html|Italian]]
+  * [[http://snowball.tartarus.org/algorithms/norwegian/stemmer.html|Norwegian]]
+  * [[http://snowball.tartarus.org/algorithms/portuguese/stemmer.html|Portuguese]]
+  * [[http://snowball.tartarus.org/algorithms/russian/stemmer.html|Russian]]
+  * [[http://snowball.tartarus.org/algorithms/spanish/stemmer.html|Spanish]]
+  * [[http://snowball.tartarus.org/algorithms/swedish/stemmer.html|Swedish]]
  
+ <!> Gotchas:
+  * Although the Lovins stemmer is described as faster than Porter/Porter2, practically it is much slower in Solr, as it is implemented using reflection.
+  * Neither the Lovins nor the Finnish stemmer produce correct output (as of Solr 1.4), due to a [[http://article.gmane.org/gmane.comp.search.snowball/1139|known bug in Snowball]]
  
  <<Anchor(WordDelimiterFilter)>>
  ==== solr.WordDelimiterFilterFactory ====