You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@solr.apache.org by ja...@apache.org on 2024/02/27 15:25:35 UTC

(solr) branch branch_9x updated: SOLR-15444 Document the SwedishMinimalStemmer (#380)

This is an automated email from the ASF dual-hosted git repository.

janhoy pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git


The following commit(s) were added to refs/heads/branch_9x by this push:
     new 004e78f7167 SOLR-15444 Document the SwedishMinimalStemmer (#380)
004e78f7167 is described below

commit 004e78f7167e6961d76bb332986bcdf27ee85000
Author: Jan Høydahl <ja...@apache.org>
AuthorDate: Tue Feb 27 16:22:14 2024 +0100

    SOLR-15444 Document the SwedishMinimalStemmer (#380)
    
    (cherry picked from commit d0ced20ab41045d8beb268dfcbf316aa59e9a4cf)
---
 .../modules/indexing-guide/pages/language-analysis.adoc  | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc b/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
index 9609126aebe..669b40d674b 100644
--- a/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
+++ b/solr/solr-ref-guide/modules/indexing-guide/pages/language-analysis.adoc
@@ -115,6 +115,7 @@ A sample fieldType configuration could look like this:
 
 IMPORTANT: When adding the same token twice, it will also score twice (double), so you may have to re-tune your ranking rules.
 
+[#stemmeroverridefilterfactory]
 == StemmerOverrideFilterFactory
 
 Overrides stemming algorithms by applying a custom mapping, then protecting these terms from being modified by stemmers.
@@ -3228,16 +3229,25 @@ Lucene includes an example stopword list.
 
 ==== Swedish Stem Filter
 
-Solr includes two stemmers for Swedish: one in the `solr.SnowballPorterFilterFactory language="Swedish"`, and a lighter stemmer called `solr.SwedishLightStemFilterFactory`.
+Solr includes three stemmers for Swedish: one in the `solr.SnowballPorterFilterFactory language="Swedish"`, a lighter stemmer called `solr.SwedishLightStemFilterFactory`, and a minimal stemmer `solr.SwedishMinimalStemFilterFactory`.
+
+The Light variant is based on simple rules, and removes suffixes like `-het`, `-heten`, `-else`, `-elser` etc., while the Minimal one only tries to normalize singular/plural endings like `-er`, `-ar`, `-arne` etc. See {lucene-javadocs}/analysis/common/org/apache/lucene/analysis/sv/package-summary.html[the Lucene javadocs] for more information.
+
+[NOTE]
+====
+The Swedish Light and Minimal stemmers are known to produce many conflicting word stems, significantly hurting search precision. It may be necessary to provide an extensive list of custom stemmer mappings to counteract this, e.g. using the xref:stemmeroverridefilterfactory[StemmerOverrideFilter].
+====
+
 Lucene includes an example stopword list.
 
 Also relevant are the <<Scandinavian,Scandinavian normalization filters>>.
 
-*Factory class:* `solr.SwedishStemFilterFactory`
+*Factory class:* `solr.SwedishStemFilterFactory`, `solr.SwedishLightStemFilterFactory` and `solr.SwedishMinimalStemFilterFactory`.
 
 *Arguments:* None
 
-*Example:*
+
+*Example (SwedishLightStemFilterFactory):*
 
 [.dynamic-tabs]
 --