You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/10/30 09:31:53 UTC

[GitHub] [jena] OyvindLGjesdal commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

OyvindLGjesdal commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1296186174

   I think the current documentation points to following the Lucene behavior, since it is mentioned multiple times that the StandardAnalyzer from Lucene is used (and implicitly its behavior?)
   
   >  The default analyzer defaults to Lucene’s StandardAnalyzer.
   
   >  If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.
   
   > The multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.
   
   Maybe a note could be added in the documentation
   
   **Note** From Lucene version 9 English stopwords are no longer removed by default in StandardAnalyzer. This also changesthe default behavior for Jena 4.X. You can keep the old behavior by configuring a custom analyzer in the assembler. (link to custom analyzer or source code of assembler containing list of english stop words?)
   
   (List from https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48
   ```
   ("a" "an" "and" "are" "as" "at" "be" "but" "by" "for" "if" "in" 
    "into" "is" "it" "no" "not" "of" "on" "or" "such" "that" "the" 
   "their" "then" "there" "these" "they" "this" "to" "was" "will" "with")  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org