You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/10/15 18:20:39 UTC

[GitHub] [jena] OyvindLGjesdal opened a new issue, #1581: Upgrade lucene library to 9.4.0 for jena-text

OyvindLGjesdal opened a new issue, #1581:
URL: https://github.com/apache/jena/issues/1581

   ### Version
   
   4.7.0-SNAPSHOT
   
   ### Feature
   
   There is a migration guide:
   
   https://lucene.apache.org/core/9_4_0/MIGRATE.html
   
   For jena to build it seems to be enough to change the version, and migrating the dependency `org.apache.lucene:lucene-analyzers-common` to  `org.apache.lucene:lucene-analysis-common`.
   
   However the migration guide states that Lucene sometimes uses JUL.
   
   > Lucene Core now logs certain warnings and errors using Java Util Logging (JUL). It is therefore recommended to install wrapper libraries with JUL logging handlers to feed the log events into your app's own logging system.
   
   > Under normal circumstances Lucene won't log anything, but in the case of a problem users should find the logged information in the usual log files.
   
   > Lucene also provides a JavaLoggingInfoStream implementation that logs IndexWriter events using JUL.
   
   > To feed Lucene's log events into the well-known Log4J system, we refer to the [Log4j JDK Logging Adapter](https://logging.apache.org/log4j/2.x/log4j-jul/index.html) in combination with the corresponding system property: java.util.logging.manager=org.apache.logging.log4j.jul.LogManager.
   
   I *think* the options are:
   
   * Don't follow recommendation, messages go to stdout when running and should be visible from the different contexts of running 
   * add dependency to pom main pom.xml and jena-text pom.xml?) for https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-jul and add a note in the documentation for jena-text to use a property for logging   
   * OR add the dependency https://mvnrepository.com/artifact/org.slf4j/jul-to-slf4j for sl4j bridge (which has a note on performance of the bridge https://www.slf4j.org/legacy.html#jul-to-slf4j) 
   
   **Change in behaviour**:
   
   StandardAnalyzer looks like it is used by default:
   
   > English stopwords are no longer removed by default in StandardAnalyzer ([LUCENE-7444](https://issues.apache.org/jira/browse/LUCENE-7444))
   To retain the old behaviour, pass EnglishAnalyzer.ENGLISH_STOP_WORDS_SET as an argument to the constructor
   
   I've looked through the othet notes by mostly checking for usage (grep -R in `jena-text` folder), and think these are the parts mentioned in the migration affecting Jena.One thing could be that changes could break custom drop in configured implementations since there are alot of changes in the paths for the artifacts and elsewhere?
   
   ### Are you interested in contributing a solution yourself?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] OyvindLGjesdal commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
OyvindLGjesdal commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1304066527

   Changed version to 9.4.1 to catch a recent bugfix, https://lucene.apache.org/core/9_4_1/changes/Changes.html#v9.4.1.bug_fixes I don't know if it affects us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1296268922

   @OyvindLGjesdal -- good find and a good plan to switch and note what the old setting was in the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] phtyson commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
phtyson commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1295971397

   Regarding the default analyzer, I'm torn about this. Did jena-text originally use the default analyzer because that was a design choice, or just because it was the default? If the design choice was (and is) to provide a default analyzer with English stop words, then that should be provided with the upgrade.
   However, I can see an argument for just using lucene pretty much how it comes--the real value to jena community is that it's connected. Lucene is highly customizable, and it might be presumptuous to assume any type of analyzer is suitable "by default".
   But, the principle of least surprise for upgrading users would be to keep the default behavior the same as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1295963674

   Email sent: https://lists.apache.org/thread/367pqwnf1zjbb0t3or9vwxlylqgrdd59


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1303702073

   Absent any other information, lets' do this upgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1279993073

   Thanks for the investigation. Upgrading Lucene is always a slight roadbump because of file format changes.
   
   All these issues will need to be in the release announcement.
   
   This would be good to pre-announce on users@jena and dev@jena and ask for feedback.
   
   Versions: IIRC (and it's been a while so caution!) `lucene-backwards-codecs` avoids the need to reload data on a major version change. Is this still true this time?
   
   JUL:
   Weird choice at this stage in the lifetime of Lucene.
   
   Use `jul-so-slf4j` - it's used elsewhere in Jena. We don't assume Log4j2 is the providing logging system - could be logback. There is no way round the potentially cyclic setup if the app wants JUL as the providing logging system - it'll have to exclude `jul-so-slf4j` itself. (This is includes one of my own projects! JUL is nice and light when you just want logging messages out.)
   
   StandardAnalyzer:
   
   Not sure whether we should set `EnglishAnalyzer.ENGLISH_STOP_WORDS_SET` in java (compatibility) or use this point in time to make the jump. `StandardAnalyzerAssembler` is impacted.
   
   Another "ask" on users@ thing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] OyvindLGjesdal commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
OyvindLGjesdal commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1296186174

   I think the current documentation points to following the Lucene behavior, since it is mentioned multiple times that the StandardAnalyzer from Lucene is used (and implicitly its behavior?)
   
   >  The default analyzer defaults to Lucene’s StandardAnalyzer.
   
   >  If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.
   
   > The multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.
   
   Maybe a note could be added in the documentation
   
   **Note** From Lucene version 9 English stopwords are no longer removed by default in StandardAnalyzer. This also changesthe default behavior for Jena 4.X. You can keep the old behavior by configuring a custom analyzer in the assembler. (link to custom analyzer or source code of assembler containing list of english stop words?)
   
   (List from https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48
   ```
   ("a" "an" "and" "are" "as" "at" "be" "but" "by" "for" "if" "in" 
    "into" "is" "it" "no" "not" "of" "on" "or" "such" "that" "the" 
   "their" "then" "there" "these" "they" "this" "to" "was" "will" "with")  
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] OyvindLGjesdal commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
OyvindLGjesdal commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1296193189

   I looked at jena-geosparql and tried reusing the sl4jbridge utility method `routeJULtoSLF4J();`
   
   On another note, I see `TextQuery` injects the textindexdump and textindexer, so should routeJULtoSLF4J method be updated to only route if not installed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1581:
URL: https://github.com/apache/jena/issues/1581#issuecomment-1306032695

   Thanks! Good to pick up the incremental version - Jena' experience is that Lucene has stuck to semantic versioning quite carefully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs closed issue #1581: Upgrade lucene library to 9.4.0 for jena-text

Posted by GitBox <gi...@apache.org>.
afs closed issue #1581: Upgrade lucene library to 9.4.0 for jena-text
URL: https://github.com/apache/jena/issues/1581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org