You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/24 20:18:46 UTC

[GitHub] [lucene] magibney commented on pull request #380: LUCENE-10171 - Fix dictionary-based OpenNLPLemmatizerFilterFactory caching issue

magibney commented on pull request #380:
URL: https://github.com/apache/lucene/pull/380#issuecomment-1050227228


   This patch applies cleanly and all tests pass. I plan to commit this within the next few days, because i think it does improve things (targeting 9.1 release).
   
   But I want to go on the record mentioning that there are a number of things I've noticed in the process of looking at this code (completely orthogonal to this PR) that make me a bit uncomfortable:
   
   1. All the objects retrieved through OpenNLPOpsFactory are cached in [static maps](https://github.com/apache/lucene/blob/main/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java#L41-L48) that are [never cleared](https://github.com/apache/lucene/blob/main/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java#L191-L199).
   2. These maps are populated in a way where there's a race condition (could end up with multiple copies of these objects being held external to this class).
   
   wrt this PR in particular, I'm nervous about the fact that everything _but_ the DictionaryLemmatizers have long been cached as objects, which makes me wonder if there was a reason that from this class's inception, the dictionaryLemmatizers have been cached as String versions of the raw dictionary. But I've tried to chase down this line of reasoning, and still can't find any obvious problem with this change.
   
   Analogous to not "letting the perfect be the enemy of the good", I'm going to not "let the 'not-so-good' be the enemy of the 'probably worse'". I hope this is the correct decision.
   
   @spyk thanks for your patience, and for keeping the PR up-to-date. If you can add a CHANGES.txt entry, that's probably warranted here (if only because of the change to the public API).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org