You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/12/14 08:45:20 UTC

[GitHub] [lucenenet] NightOwl888 commented on pull request #572: Lucene.Net.Search.ControlledRealTimeReopenThread: Refactor to integrate changes from #513

NightOwl888 commented on pull request #572:
URL: https://github.com/apache/lucenenet/pull/572#issuecomment-993308412


   Okay, let's try to establish first what we are trying to achieve:
   
   1. A direct line-by-line port from Java to C#
   2. A (mostly) lockless implementation that functions similarly to Lucene's original design
   
   We tried a direct line-by-line port, and it doesn't function correctly in .NET because of low level differences in the way concurrency is implemented. It was changed to its current (mostly) lockless form in the following 2 commits:
   
   https://github.com/apache/lucenenet/commit/d3621f0c3a02bf3c82603c1d94916e70e06b3035
   https://github.com/apache/lucenenet/commit/dfb2e92839dd29cbdc47d8c37af9325ea351709b
   
   IMO, as long as it functions correctly (i.e. passes all of the tests and doesn't deadlock), the form is less important than the function. The fact that it is lockless is more in line with how modern .NET code is written. So, as long as we protect the state so it is read and written to atomically, this is fine.
   
   > I **strongly** prefer my variable naming of intrinsic for the additional `EventWaitHandle` over the new name of `m_signal`.  The former helps throw light on the fact that it’s being used to mimic java's ability to signal on the intrinsic class which c# can’t do and hence the use of the extra`EventWaitHandle`.
   
   .NET's direct equivalent to `this.notifyAll()` is [`Monitor.PulseAll(this)`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.monitor.pulseall?view=net-6.0), which a subclass would automatically be able to receive if we were using it. I attempted to change it (calling `UninterruptableMonitor.PulseAll(this)`) instead of using the `EventWaitHandle` but this doesn't play well with the changes to the locks to make the class mostly lockless. The problem here is that the code is no longer waiting on `this`, so pulsing on it isn't going to wake up the threads (but changing to wait on `this` also deadlocks). I suspect we would also need to change the lock that protects `searchingGen` and `waitingGen` to lock on a different object than `this` if we were to pull that off.
   
   The class is not sealed in Lucene (meaning it was designed with inheritance in mind). So, if we are going to use `EventWaitHandle`, we need to mark its declaration protected to keep the same level of extensibility as Lucene, otherwise a subclass won't be able to receive the notification from the base class.
   
   `intrinsic` seems like a very unusual name to me, being that the "intrinsic" way to do this would be to use `Monitor.PulseAll(this)`. It doesn't explain to the consumer what the event does (as part of the .NET component), only what it *did in Java*, which the person using the class might not be aware of. `m_signal` makes more sense IMO. But  `m_notify` would be fine if that makes it more clear to developers who may be looking at the Java source code as well as the ones who haven't seen it.
   
   > You changed `refreshStartGen` from a `long` to a `AtomicInt64`.  I don’t believe this changes is necessary.  This variable is a `long` in the original java code.
   
   In .NET, reading `long` is not atomic. This is to ensure when the `Run()` method sets it, another call to the `RefreshDone()` method will be able to read it atomically.
   
   The test for `IRefreshListener` [uses an `AtomicBoolean`](https://github.com/apache/lucenenet/blob/19cb0c1d806462aad2b6297e0885491062887456/src/Lucene.Net.Tests/Search/TestControlledRealTimeReopenThread.cs#L612) when running the test. That is a good indication we are not expecting the call to `RefreshDone()` to be on the same thread as `Run()`. While `Run()` calls `MaybeRefreshBlocking()` right after setting the variable, there is a non-zero chance another thread could call `MaybeRefresh()` or `MaybeRefreshBlocking()` and fire the listener at the same time the write is being done, since the `ReferenceManager<T>` is passed in through the constructor.
   
   > You changed `isDispose` from `bool` to `AtomicBoolean`.  I don’t believe this change is necessary.  Reading and writing a Boolean is inherently atomic  see [Is a bool read/write atomic in C#](https://stackoverflow.com/questions/59422/is-a-bool-read-write-atomic-in-c-sharp#answer-59430)
   
   Reading a `bool` is atomic. Writing a `bool` is atomic. But doing both operations together cannot be done atomically. The change to `AtomicBoolean` is so we can ensure only one thread enters the `Dispose()` method. 
   
   ```c#
               // This both checks the previous value and updates it to true atomically
               if (isDisposed.GetAndSet(true))
               {
                   return;
               }
   ```
   
   > NOTE: The comment doesn't exist in the source code, we should put it there to ensure it is clear why `AtomicBoolean` is necessary here. In case you are curious, `AtomicBoolean` uses an `int` and uses [Interlocked.Exchange to do both operations atomically](https://github.com/NightOwl888/J2N/blob/b7f80ca424af2b077b5d42beff4d9fbad8b05ef5/src/J2N/Threading/Atomic/AtomicBoolean.cs#L106). `Interlocked`, however does not support `bool`.
   
   The rest of the logic was also reviewed to ensure that one thread is able to successfully dispose all of the resources whether or not there is an exception.
   
   > You removed the holding of a lock on the `Dispose` method.  This is a major drift from the Java implementation.  Please note the java `close` method for this class is a synchronized method.
   
   Actually, @vvdb [removed the lock in 2017](https://github.com/apache/lucenenet/commit/dfb2e92839dd29cbdc47d8c37af9325ea351709b#diff-638564666c0e7e28327d70790e03a2dc00131ad1fd2086a7ac8e62a7b641453eL118) when the class was redesigned to be mostly lockless. He also removed the `reopenLock`.
   
   Vincent sent me the code via email. The whole conversation is in the March, 2017 archive here under the subject `TestSearcherManager_Mem`.
   
   [ControlledRealTimeReopenThread.txt](https://github.com/apache/lucenenet/files/7708264/ControlledRealTimeReopenThread.txt)
   
   http://mail-archives.apache.org/mod_mbox/lucenenet-dev/201703.mbox/browser
   
   Please read the thread. Vincent was aware that there were still some timing issues with the implementation he submitted, so this tracks with your reasoning for changing the timing code.
   
   He also mentioned that as written, this code didn't even function correctly in Java and suggested that we port the code and tests from Lucene 4.9.0 instead.
   
   The while change history of this class can be viewed in Git with the following command:
   
   ```
   git log --follow -p -- src/Lucene.Net/Search/ControlledRealTimeReopenThread.cs
   ```
   
   I once held the view that reverting this to exactly the same state as it was in Java was the best approach, but I have been convinced (with evidence) that it is simply not possible to do it exactly the same way in .NET due to how differently concurrency is implemented between platforms. We need to go in at a lower level and actually port what the code *does* with regard to what is atomic and what is safe to read without being atomic and not worry so much about how differently it *looks* in .NET.
   
   > By removing the `searchingGen = long.MaxValue; ` line in the `Dispose` method and using a conditional assignment of `searchingGen`  in the `RefreshDone` method this new version of the code does not track as closely with the java code as my original ported version.  This change is also why in the `Dispose` method you needed to replace the call to `Intrensic.Set()` and it’s *very* helpful comment with a call to `RefrsehDone()`   which adds more drift from the java implementation.    It's not clear these changes improve the code but they do make it harder to compare to the java code.
   
   Again, the form of how this looks is not as critical as its function. Putting this inside of a method means we don't have to duplicate the locking around this code in Dispose() to ensure the operation is atomic, and is therefore easier to maintain. 
   
   The only purpose of the lock in the current iteration is to ensure both `searchingGen` and `waitingGen` are read and written to atomically, nothing more. The rest of the code is lockless.
   
   ### Timing
   
   What hasn't been established is whether this PR still fixes the bug that you were seeing in the timing. This passes the tests that you added. Is this working the way you expect it to or not?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org