You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2020/07/21 05:33:54 UTC

[GitHub] [lucenenet] bongohrtech opened a new pull request #313: Fix/random seed simple

bongohrtech opened a new pull request #313:
URL: https://github.com/apache/lucenenet/pull/313


   This addresses a fix for random failures in the TestRandomstrings and TestRandomHugeStrings. It seems that in the tight loop using the same Random variable creates some issues "randomly". This was not repeatable using a seed so my guess is it is due to the threads or some other concurrency within the subfunctions.
   
   I did a lot of work to prove that the outputs from RandomAnalysisString where consistent. The issue was CheckAnalysisConsistency and I didnt see an obvious or even obscure reason why it would fail.
   
   I havent run into any test fails so far but that doesnt mean there isnt one in the next run. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663343798


   > Its a bit strange, but although this seems to have fixed the `TestRandomStrings` tests for `TestICUFoldingFilter` and `TestThaiAnalyzer`, the `TestThaiAnalzyer::TestRandomHugeStrings()` test still fails. But digging into it, they both terminate in the same place, the only difference is the `maxWordLength` parameter is increased.
   > 
   > I suspect we may have a difference in behavior somewhere in `TestUtil.RandomAnalysisString(Random, int, bool)` that may be causing some rare weirdness. Sadly, `TestUtil` has no tests to verify the behavior is doing what it should be doing.
   > 
   > `TestUtil.RandomAnalysisString()` is also called by `Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()` and `Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range()`, both which are also randomly failing. Perhaps one of the paths that `TestUtil.RandomSubString()` is going down is broken, which would explain the randomness. I suggest to divide and conquer - keep excluding the random paths until you find the one that causes the failure to stop happening. That would probably be a bit quicker than reviewing every one of those methods and comparing them against the Java implementation.
   
   I am not seeing the failing tests for TestRandomHugeStrings - can you rerun and perhaps send the seed failing? Ive run this several times and havent hit this issue. However, maybe this is because im using the FindFirstFailingSeed attribute - this sets NUnit.Framework.Internal.Randomizer.InitialSeed and NUnit.Framework.Internal.TestExecutionContext.CurrentContext.CurrentTest.Seed to the same seed which are not under normal conditions. 
   
   RandomAnalysisString is producing the same result each time, I verified this by writing the hashed outputs to a file with the iteration and comparing them on subsequent runs. All identical. I can include this code in the pull request if you want to review?
   
   I will take a look at those other tests now (Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()/Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range())
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-661644111


   FindFirstFailing and Seed attribute files allow to specify a particular seed to run a test under


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663348518


   Seems both these tests call RandomUnicodeString which is predictably failing on the same seed. This one is solvable i think. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 closed pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

NightOwl888 closed pull request #313:
URL: https://github.com/apache/lucenenet/pull/313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-661843488

Its a bit strange, but although this seems to have fixed the `TestRandomStrings` tests for `TestICUFoldingFilter` and `TestThaiAnalyzer`, the `TestThaiAnalzyer::TestRandomHugeStrings()` test still fails. But digging into it, they both terminate in the same place, the only difference is the `maxWordLength` parameter is increased.

I suspect we may have a difference in behavior somewhere in `TestUtil.RandomAnalysisString(Random, int, bool)` that may be causing some rare weirdness. Sadly, `TestUtil` has no tests to verify the behavior is doing what it should be doing.

`TestUtil.RandomAnalysisString()` is also called by `Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()` and `Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range()`, both which are also randomly failing. Perhaps one of the paths that `TestUtil.RandomSubString()` is going down is broken, which would explain the randomness. I suggest to divide and conquer - keep excluding the random paths until you find the one that causes the failure to stop happening. That would probably be a bit quicker than reviewing every one of those methods and comparing them against the Java implementation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-679378842


   The concurrency issue with `ICUTokenizer` has now been temporarily addressed in #328.
   
   As for changing the `Random` class instantiation, there doesn't seem to be any effect when doing it this way. I am closing this because it doesn't address the issues described in #288. Of course, we should have a `FindFirstFailingSeed` feature to assist with debugging, but depending on what random number generator implementation we use, it may need to be very different than the one here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663433775


   > > I will take a look at those other tests now (Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()/Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range())
   > 
   > There's no need. This has been patched already in #316.
   
   Ack I wish I had checked my email earlier, this does seem to fix the error. I had found a seed that failed but didnt understand why. I suspected IncrementToken() but clearly not the issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-667692212

> Adding this before the test runs
NUnit.Framework.Internal.Randomizer.InitialSeed = NUnit.Framework.Internal.TestExecutionContext.CurrentContext.CurrentTest.Seed;

> seems to stop the tests erroring. Is this hiding something tho? If there is one random using a different seed, this shouldn't cause an issue.

Before the test runs how? I tried adding it to the beginning of the test and it didn't seem to make a difference.

I took a deep dive into this and there were several contributors causing these tests to fail.

- The culture sensitivity was broken by the change of default when we switched to `J2N.Character` in 4.8.0-beta00008 without specifying `CultureInfo.InvariantCulture`. Fixed in #321.
- The `ThaiWordBreaker` wasn't set up to handle surrogate pairs, so would fail when there were surrogate pairs in the random input. Fixed in #322.
- Something is causing `ThaiAnalyzer` to fail when using concurrency. This one I haven't found the cause, but being that the behavior changed when I added a `.Clone()` to each of the prototypes, I suspect the issue lies in `ICU4N.RuleBasedBreakIterator` where it loads the Thai break engine. Adding locking also seems to make it fail less, but it still does occasionally. We need to add some concurrency tests to ICU4N for `BreakIterator` when using Thai to see if we can make it fail - there were no tests for concurrency in ICU4J.

There doesn't appear to be anything wrong with `BaseTokenStreamTestCase`, but when I commented out the concurrency check inside `CheckAnalysisConsistency()`, I ran the `TestRandomHugeStrings()` and `TestRandomStrings() tests 10000 times in a row each without a failure. Putting the concurrency check back in makes the failure happen after a few hundred runs.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663372626


   > I will take a look at those other tests now (Lucene.Net.Analysis.NGram.EdgeNGramTokenizerTest::TestFullUTF8Range()/Lucene.Net.Analysis.NGram.NGramTokenizerTest::TestFullUTF8Range())
   
   There's no need. This has been patched already in #316.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] bongohrtech commented on pull request #313: Fix/random seed simple

Posted by GitBox <gi...@apache.org>.

bongohrtech commented on pull request #313:
URL: https://github.com/apache/lucenenet/pull/313#issuecomment-663459085


   Adding this before the test runs
               NUnit.Framework.Internal.Randomizer.InitialSeed = NUnit.Framework.Internal.TestExecutionContext.CurrentContext.CurrentTest.Seed;
   
   seems to stop the tests erroring. Is this hiding something tho? If there is one random using a different seed, this shouldn't cause an issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org