You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by NightOwl888 <gi...@git.apache.org> on 2016/11/02 10:14:14 UTC

[GitHub] lucenenet issue #191: [WIP] Migrating Lucene.Net to .NET Core

Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/191
  
    FYI - There are 2 branches that still need to be integrated before this is up to date with master:
    
    - https://github.com/NightOwl888/lucenenet/tree/core-tests
    - https://github.com/NightOwl888/lucenenet/tree/analysis-stempel
    
    But neither of those affects the `Lucene.Net.Analysis.Common` or the `Lucene.Net.Expressions` tests, so you are doing the right thing by finding those problems first.
    
    The core-tests branch adds ~120 missing tests and has several dozen bug fixes (including fixing the 2 failing tests in `Lucene.Net.Queries` and 32 failing tests in `Lucene.Net.QueryParser`). The tests in `Lucene.Net.Tests` are now 100% complete (including parts of tests that were previously commented out, but excluding tests that only apply to JRE/JUnit).
    
    Test Context
    ====
    
    The test context issues have been "fixed" by creating stub methods to put the `[Test]` attribute on in the subclasses where the tests are supposed to be run. You probably already know that part, but there have been some additions and the new commit where these can be reverted (if needed) is: 4dbc3590361814d13fae64c8d030820eb4987489
    
    That is, if xUnit properly pulls the base class tests into the right context (if or when it is put in place), this commit can be reverted to rollback those test method stubs.
    
    LuceneNetSpecificAttribute
    ====
    
    I added an attribute to apply to tests that were added as part of the port so they can be filtered (and compared with) the tests from Java Lucene. The filter can be applied in Visual Studio as `trait:LUCENENET`.
    
    SuppressCodecsAttribute
    ====
    
    Fixed this attribute to accept an array of strings and added the correct suppressions to all of the tests in `Lucene.Net.Core`. We still need to fix the Test Framework to randomize codecs and utilize this attribute to filter them.
    
    ChangeNotes
    ====
    
    I noticed you have begun [listing some of the questions you have and unfinished tasks](https://github.com/conniey/lucenenet/blob/netcoremigration/src/Lucene.Net.Core/ChangeNotes.txt) about certain areas of the project and I know a few of the answers and have some comments as well.
    
    > TODO: Confirm HashMap emulates java properly
    
    Although these already seem to be doing the job (tests are passing), we could raise confidence about this by porting some of the tests over from the JDK. I have already begun doing this for certain pieces that were ported from or are meant to emulate parts of Java.
    
    > TODO: Tests need to be written for WeakDictionary
    
    Looks like we [have some tests already](https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests/core/Support/TestWeakDictionary.cs) from Lucene.Net 3.0.3 that exist in the repo, but haven't yet been added to the project. I am not sure how well they match up against the JDK tests.
    
    > TODO: Tests need to be written for IdentityDictionary -> Verify behavior
    
    We should probably move this into the Support namespace. AFAIK, the Support namespace is for pieces that need to be sourced from the JDK and/or close approximations to what is in the JDK that don't exist in the .NET framework. We can then port over the applicable tests for IdentityDictionary from the JDK to verify it works as expected.
    
    > ParallelReader - extra data types, using SortedDictionary in place of TreeMap.  Confirm compatibility.  Looks okay, .NET uses a r/b tree just like Java, and it seems to perform/behave just about the same.
    
    In many cases `SortedDictionary` will function as a suitable replacement for `TreeMap`. The primary difference is that `TreeMap` allows duplicate keys, where `SortedDictionary` does not.
    
    That said, it is unlikely any of the tests cover this scenario because they would have all been written under the assumption that duplicates are allowed (and therefore the functionality doesn't need to be tested per se, since it is already covered under JDK tests).
    
    In `Lucene.Net.Grouping` (WIP), `TreeMap` is required to make it work. There is a suitable TreeMap in the [C5 library](https://github.com/sestoft/C5) (see [The Working Programmer - .NET Collections: Getting Started with C5](https://msdn.microsoft.com/en-us/magazine/jj883961.aspx)), which seems to do the job. I have ported `TreeMap` over to our Support namespace along with `TreeDictionary` and several other dependencies (as well as the tests).
    
    Do note that C5 currently doesn't have .NET core support, but someone has recently made a pull request for it. So whether to integrate it into the Support namespace and get rid of the external dependency or reference it in specific places is a matter of preference.
    
    > FuzzyQuery - uses java.util.PriorityQueue<T>, which .net does not have.  Using SortedList<TKey, TVal> in it's place, which works, but a) isn't a perfect replacement (a SortedList<TKey, TVal> doesn't allow duplicate keys, which is what is sorted, where a PriorityQueue does) and b) it's likely slower than a PriorityQueue<T>. I can't tell if the PriorityQueue that is defined in Lucene.Net.Util would work in its place.
    
    There are 2 different `PriorityQueue` types in Lucene.Net:
    
    - Lucene.Net.Util.PriorityQueue
    - Lucene.Net.Support.PriorityQueue
    
    The one in the `Util` namespace is an exact match for the one in Lucene. The PriorityQueue in the `Support` namespace is intended to be an approximate match for `java.util.PriorityQueue` (it could use some tests to verify that, though).
    
    FuzzyQuery doesn't have a PriorityQueue, so you are likely referring to the wrong type where this issue exists.
    
    > Dispose needs to be implemented properly around the entire library.  IMO, that means that Close should be Obsoleted and the code in Close() moved to Dispose().
    
    AFAIK, `close()` is being universally replaced with `Dispose()` rather than dealing with the (confusing) redundancy. There are cases where it is required, though - for example, `TextWriter`. IMO, removing (where applicable) makes more sense than obsoleting the `Close()` method.
    
    > TODO: NamedThreadFactory.java - Is this needed?  What is it for, just for debugging?
    
    AFAIK, no it is not needed. It is used as a parameter to generate a concrete `ExecutorService` in Java. 
    
    ```java
        ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS,
                                       new LinkedBlockingQueue<Runnable>(),
                                       new NamedThreadFactory("TestIndexSearcher"));
    ```
    
    It looks like someone has already worked out that `TaskScheduler` is the rough equivalent in .NET (not confirmed). In all places where this is utilized in the tests, `TaskScheduler.Default`, or the [`Lucene.Net.Support.LimitedConcurrencyLevelTaskScheduler`](https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Core/Support/LimitedConcurrencyLevelTaskScheduler.cs) (which I grabbed from MSDN) seems to work well enough to get the tests to pass.
    
    While we *could* port the `ThreadPoolExecutor` from the JDK and make it a concrete `TaskScheduler` so we would have a place to utilize the `NamedThreadFactory`, it feels like that is a step away from .NETification rather than just allowing people to use the `TaskScheduler` natively. IMO, it makes more sense just to comment it out or remove the file from the project (but not from the repo - it would be best to keep the file around with comments explaining why it wasn't utilized).
    
    > TODO: LockStressTest.java - Not yet ported.
    > TODO: MMapDirectory.java - Port Issues
    > TODO: NIOFSDirectory.java - Port Issues
    
    These are now ported and most of the bugs they had were addressed.
    
    Additional Items to Add to the List
    -----
    
    **`AlreadyClosedException`** - In .NET, we already have an `ObjectDisposedException` so this is reinventing the wheel. The `IndexReader` and `IndexWriter` classes depend heavily on the right exception type being caught in order to provide the correct response. This may be an issue if a nested .NET type throws an `ObjectDisposedException` rather than the expected `AlreadyClosedException`. While we could partially address this by inheriting `ObjectDisposedException` from `AlreadyClosedException`, and always catch `ObjectDisposedException`, it still provides a window where consumers could incorrectly catch `AlreadyClosedException` and later be bitten by an uncaught `ObjectDisposedException`. We should probably just remove this type and use the native `ObjectDisposedException`.
    
    **`ICharSequence`** - Many types have replaced the use of `ICharSequence` with `string`. Its too bad that .NET doesn't have an equivalent interface as `ICharSequence` that is shared between `string` and `StringBuilder` that could be utilized. But using only `string` has limited the API to a certain extent. A few types within Lucene.Net implement `ICharSequence` (`CharTermAttribute`, `CharBlockArray`, `OpenStringBuilder`, `StringCharSequenceWrapper`). It could save a step to have overloads for `string`, `ICharSequence`, and `StringBuilder` in all places that were `ICharSequence` in Lucene rather than having to call `ToString()` on the latter two.
    
    -----
    
    Is there anything I can do to help you get this up to date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---