You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2022/11/07 05:22:43 UTC

[GitHub] [lucenenet] NightOwl888 commented on issue #271: TokenStream.IncrementToken() is called after Dispose() is called

NightOwl888 commented on issue #271:
URL: https://github.com/apache/lucenenet/issues/271#issuecomment-1305099323

   @vvdb-architecture
   
   Thanks. So that leaves the question - how do we enforce the [TokenStream workflow contract](https://lucenenet.apache.org/docs/4.8.0-beta00016/api/core/Lucene.Net.Analysis.TokenStream.html) in .NET?
   
   The exception is meant to be a guide to help at development time to ensure all of the methods of the contract are followed. Once that is done, this exception will never actually occur at runtime. So, that means the part that they are worried about - adding an exception handler inside a finally block - is not a thing that will ever need to happen.
   
   One option I have considered is to build a Roslyn code analyzer to enforce this contract at design time, but I am not sure how complex it would be to analyze APIs of the user code no matter how they have it configured and ensure that all of the methods are called in the right sequence and the right number of times in the code. They could be building one or more abstractions that do some of the `TokenStream` operations while leaving the others somewhere else, or they may be in conditional parts of the code, for example.
   
   Some of the rules, such as enforcing the calls to `base.Reset()`, and `base.End()` when they are overridden would be easy to enforce, while others would take more work. The exception in the `Dispose()` method is definitely a simpler approach, but if we had an Roslyn code analyzer that worked reliably, it would help to speed up the development process by catching this sort of thing at design time instead of runtime.
   
   However, this still doesn't get us out of the situation we are in where `close()` then a call to `reset()` to start the `TokenStream` over from the beginning is allowed in Lucene - `Dispose()` is meant to be the final call that doesn't allow anything else to happen to the object. This is still a violation of the `Dispose()` contract in .NET, so we need to dig into whether `Close()` makes sense here, just to inform consumers that it is not actually a `Dispose()` operation. Or maybe `End()` is supposed to signify when the end of the stream is reached (after which case `Reset()` is allowed) and `Dispose()` is supposed to signify the end of the operation. In that case the [tight loop](https://github.com/apache/lucenenet/blob/081edeed35b190c1d535dcfdfeb91143f0ef818f/src/Lucene.Net/Index/DocFieldProcessor.cs#L280-L284) should be refactored never to call `Dispose()` inside of the loop and have a final loop at the end that calls `Dispose()` on all of the `TokenStreams` to ensure methods n
 ever are called after `Dispose()`.
   
   According to [this error message](https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/analysis/Tokenizer.java#L110-L112), it is supposed to be invalid to call `reset()` multiple times. If that is really the case, then the [tight loop](https://github.com/apache/lucenenet/blob/081edeed35b190c1d535dcfdfeb91143f0ef818f/src/Lucene.Net/Index/DocFieldProcessor.cs#L280-L284) can be considered a bug that needs to be fixed, and we don't need to change `Dispose()` back to `Close()`. Although, the bug may boil down to the fact we added an [extra call to `Dispose()` in .NET](https://github.com/apache/lucenenet/blob/dfae964ce0a1b066fd808275fd81d90385cde3fe/src/Lucene.Net/Analysis/Tokenizer.cs#L76) and it shouldn't actually be there.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org