You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-commits@jackrabbit.apache.org by th...@apache.org on 2023/05/24 13:50:45 UTC

[jackrabbit-oak] branch OAK-10262 created (now fa4faaad75)

This is an automated email from the ASF dual-hosted git repository.

thomasm pushed a change to branch OAK-10262
in repository https://gitbox.apache.org/repos/asf/jackrabbit-oak.git


      at fa4faaad75 OAK-10262 Document ASCIIFolder and OakAnalyzer

This branch includes the following new commits:

     new fa4faaad75 OAK-10262 Document ASCIIFolder and OakAnalyzer

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[jackrabbit-oak] 01/01: OAK-10262 Document ASCIIFolder and OakAnalyzer

Posted by th...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

thomasm pushed a commit to branch OAK-10262
in repository https://gitbox.apache.org/repos/asf/jackrabbit-oak.git

commit fa4faaad7557fd18bf18f37dfb64ded39f7be3a6
Author: Thomas Mueller <th...@apache.org>
AuthorDate: Wed May 24 15:50:31 2023 +0200

    OAK-10262 Document ASCIIFolder and OakAnalyzer
---
 oak-doc/src/site/markdown/query/lucene.md | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/oak-doc/src/site/markdown/query/lucene.md b/oak-doc/src/site/markdown/query/lucene.md
index c50af4f9d7..711d0e3005 100644
--- a/oak-doc/src/site/markdown/query/lucene.md
+++ b/oak-doc/src/site/markdown/query/lucene.md
@@ -762,10 +762,15 @@ defaults to 5
 
 #### <a name="analyzers"></a>Analyzers
 
+If no analyzer is specified, then `OakAnalyzer` is used, which uses the
+Apache Lucene `StandardTokenizer`, the `LowerCaseFilter`,
+and the `WordDelimiterFilter` with the following options:
+`GENERATE_WORD_PARTS`, `STEM_ENGLISH_POSSESSIVE`, and `GENERATE_NUMBER_PARTS`.
+
 `@since Oak 1.5.5, 1.4.7, 1.2.19`
-Unless custom analyzer is configured (as documented below), in-built analyzer
-can be configured to include original term as well to be indexed. This is
-controlled by setting boolean property `indexOriginalTerm` on analyzers node.
+Unless custom analyzer is explicitly configured (as documented below), the built-in analyzer
+can be configured to include the original term as well (`PRESERVE_ORIGINAL`). This is
+controlled by setting boolean property `indexOriginalTerm` on analyzers node:
 
     /oak:index/assetType
       - jcr:primaryType = "oak:QueryIndexDefinition"
@@ -845,7 +850,17 @@ all the other components (e.g. `charFilters`, `Synonym`) are optional.
 
 #### Examples
 
-Adding stemming support
+To convert umlauts using ASCII folding, use:
+```
+    + analyzers
+      + default
+        + tokenizer
+          - name = "Standard"
+        + filters (nt:unstructured) // the filters needs to be ordered
+          + ASCIIFolding
+```
+
+For stemming support, use:
 ```
 1. Use an analyzer which has stemming included by default e.g. EnglishAnalyzer which has PorterStemFilter.
     + analyzers