You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/21 10:41:12 UTC

[GitHub] [lucene] uschindler commented on pull request #11955: Remove synchronization from OpenNLP integration and add thread-safety tests(checkRandomData)

uschindler commented on PR #11955:
URL: https://github.com/apache/lucene/pull/11955#issuecomment-1321850829

   > > Does this library also check for race conditions that can arise between ResourceLoaderAware::inform vs TokenStream creation and processing? I know it may be out of the scope of this change but I would be curious to know..
   > 
   > Specific to this, I think one potential plan: we could refactor the tests more to check for it. Existing tests are using `BaseTokenStreamTestCase` but we could also test factories with https://github.com/apache/lucene/blob/main/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/BaseTokenStreamFactoryTestCase.java
   > 
   > And maybe we could add evil stuff to this `BaseTokenStreamFactoryTestCase` to root out any factory-specific thread hazards across all of our factories (including opennlp).
   
   Hi, there should not be any race conitions between TokenStreamFactory's constructor, `inform()` and creation of token streams. For legacy reasons with Apache Solr there is still the split between constructor and inform(), but acatually, the factory should initialize itsself completely in constructorand all fields should be final. I would fix this with Lucene 10 at some point by removing the ResourceLoaderAware interface and just allow the factory to have a ResourceLoader (optinally, only if needed) passed next to the map in ctor. I have some plans to do this and I would also fix Solr later. My plan is to allow to declare a fcatory to have a ctor with `ResourceLoader`if it needs it. The SPI code would look for both constructors and call the right one.
   
   At moment this is not problem, because the factories are always created without races: ctor is called, followed by the inform. After that the instance of factory is ready to be used. Any code violating this fails soon, because the code won't find its resources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org