You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@solr.apache.org by ja...@apache.org on 2021/05/24 15:30:33 UTC

[solr] branch main updated: SOLR-15401: Document NorwegianNormalizationFilter (#132)

This is an automated email from the ASF dual-hosted git repository.

janhoy pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/solr.git


The following commit(s) were added to refs/heads/main by this push:
     new 12a777e  SOLR-15401: Document NorwegianNormalizationFilter (#132)
12a777e is described below

commit 12a777e8001e0121af8adf6f89a3ae08429e1f02
Author: Jan Høydahl <ja...@users.noreply.github.com>
AuthorDate: Mon May 24 17:30:23 2021 +0200

    SOLR-15401: Document NorwegianNormalizationFilter (#132)
---
 solr/CHANGES.txt                               |  2 ++
 solr/solr-ref-guide/src/language-analysis.adoc | 46 +++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index 2129d2f..dc962cf 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -295,6 +295,8 @@ Other Changes
 
 * SOLR-15222: userfiles dir will only be created in SolrCloud mode (Mike Drob)
 
+* SOLR-15401: Document the new NorwegianNormalizationFilter introduced in LUCENE-9929. (janhoy)
+
 * SOLR-15409: Upgrade to Zookeeper 3.7.0 (Mike Drob)
 
 Bug Fixes
diff --git a/solr/solr-ref-guide/src/language-analysis.adoc b/solr/solr-ref-guide/src/language-analysis.adoc
index 20dd95e..3f3fbef 100644
--- a/solr/solr-ref-guide/src/language-analysis.adoc
+++ b/solr/solr-ref-guide/src/language-analysis.adoc
@@ -2018,7 +2018,7 @@ Solr includes two classes for stemming Norwegian, `NorwegianLightStemFilterFacto
 
 Another option is to use the Snowball Porter Stemmer with an argument of language="Norwegian".
 
-Also relevant are the <<Scandinavian,Scandinavian normalization filters>>.
+For normalization, there is a `NorwegianNormalizationFilterFactory` which is a variant of the <<Scandinavian,Scandinavian normalization filters>> but with folding rules tuned for Norwegian.
 
 ==== Norwegian Light Stemmer
 
@@ -2125,6 +2125,50 @@ The `NorwegianMinimalStemFilterFactory` stems plural forms of Norwegian nouns on
 
 *Out:* "bil"
 
+==== Norwegian Normalization Filter
+
+This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØåÅ and folded variants (ae, oe and aa) by transforming them to æÆøØåÅ. This is a variant of `ScandinavianNormalizationFilter`, with folding rules customized for Norwegian.
+
+*Factory class:* `solr.NorwegianNormalizationFilterFactory`
+
+*Arguments:* None
+
+*Example:*
+
+[.dynamic-tabs]
+--
+[example.tab-pane#byname-lang-norwegian]
+====
+[.tab-label]*With name*
+[source,xml]
+----
+<analyzer>
+<tokenizer name="standard"/>
+<filter name="lowercase"/>
+<filter name="norwegianNormalization"/>
+</analyzer>
+----
+====
+[example.tab-pane#byclass-lang-norwegian]
+====
+[.tab-label]*With class name (legacy)*
+[source,xml]
+----
+<analyzer>
+<tokenizer class="solr.StandardTokenizerFactory"/>
+<filter class="solr.LowerCaseFilterFactory"/>
+<filter class="solr.NorwegianNormalizationFilterFactory"/>
+</analyzer>
+----
+====
+--
+
+*In:* "blåbærsyltetøj blåbärsyltetöj blaabaarsyltetoej blabarsyltetoj"
+
+*Tokenizer to Filter:* "blåbærsyltetøy", "blåbärsyltetöy", "blaabaersyltetoey", "blabarsyltetoy"
+
+*Out:* "blåbærsyltetøy", "blåbærsyltetøy", "blåbærsyltetøy", "blabarsyltetoy"
+
 === Persian
 
 ==== Persian Filter Factories