You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2020/06/16 18:48:02 UTC

[GitHub] [lucenenet] jregnier opened a new issue #296: IndexOutOfRangeException when searching

jregnier opened a new issue #296:
URL: https://github.com/apache/lucenenet/issues/296


   Hello, I'm getting an IndexOutOfRangeException when searching in some cases. It's happening maybe 10% of the time so I'm unsure what is causing this. See below for the search code and stack trace. I feel like it might be something with the query but its more of a guess. Any guidance on this would be very much appreciated.
   
   `var sort = new Sort(new SortField(null, SortFieldType.DOC));
   return _searcher.Search(Query, _reader.NumDocs, sort);`
   
   `FATAL	 Update Data Set System.IndexOutOfRangeException: Index was outside the bounds of the array.
      at Lucene.Net.Store.ByteArrayDataInput.ReadVInt32()
      at Lucene.Net.Codecs.BlockTreeTermsReader.FieldReader.IntersectEnum.Frame.NextLeaf()
      at Lucene.Net.Codecs.BlockTreeTermsReader.FieldReader.IntersectEnum.Next()
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.ConstantScoreAutoRewrite.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Join.ToParentBlockJoinQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n, Sort sort)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-693725221


   Could someone please provide a minimal example I can run? Even with the descriptions here, there is not enough info to piece together both the code and the data to reproduce this without research and trial and err. There is probably a test that is similar enough to what you are doing in [`TestWildcard`](https://github.com/apache/lucenenet/blob/fc2da940da3ca32c2fe6ae9caf69f36b69f3de7f/src/Lucene.Net.Tests/Search/TestWildcard.cs) to use as a starting point, just modify it accordingly and post it here so we can run it. If you need to, use the `[Repeat]` attribute to run it multiple times to force a failure.
   
   Also, what platform is this happening on and is this x86 or x64?
   
   Note there are now [8 known failing tests](https://github.com/apache/lucenenet/issues/269) on .NET Framework under x86 in 4.8.0-beta00011 and prior, several of which relate to `FuzzyTermsEnum` and `TopTermsRewrite`. These test failures go away with optimizations disabled, indicating they are likely JIT optimization bugs of some kind. Even in 4.8.0-beta00012 there are still 4 tests failing, and it will be difficult to pin these down because the failures are not happening in debug mode. These tests do not fail on .NET Core/x86 or on .NET Framework/x64.
   
   4.8.0-beta00012 can be downloaded at https://dist.apache.org/repos/dist/dev/lucenenet/ (it is currently pending the [release vote](https://lucenenet.apache.org/contributing/make-release.html), which takes 72 hours). Could someone please confirm this problem still exists on 4.8.0-beta00012?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] LunarExplorer commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
LunarExplorer commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-695352962


   I get fairly reproducible Lucene.Net.Facet FastTaxonomyFacetCounts System.IndexOutOfRangeException.
   
    Message: 
   exception: 
       System.IndexOutOfRangeException: Index was outside the bounds of the array.
     Stack Trace: 
       FastTaxonomyFacetCounts.Count(IList`1 matchingDocs)
       FastTaxonomyFacetCounts.ctor(String indexFieldName, TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc)
       FastTaxonomyFacetCounts.ctor(TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc)
   
   I'm happy to provide further info or even hop on some kind of screen share to do debug interactively if that helps?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668034914


   I'm unable to reproduce the bug on 4.8.0-beta00006, I will try 4.8.0-beta00007 next. Hope this helps. 
   
   I was also unable to reproduce on 4.8.0-beta00007.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668009001






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-708721094


   @willson556 
   
   Thanks for the info.
   
   If a test is too much to ask, could you distill this down to a console app using the failing data set and put it in a repo to share?
   
   If the data is sensitive, do note that both [Azure DevOps](https://azure.microsoft.com/en-us/services/devops/) and [BitBucket](https://bitbucket.org/) allow you to create free private repos that you can then share by invitation. Just use the email address in [my GitHub profile](https://github.com/NightOwl888).
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] willson556 edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
willson556 edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-708676417


   I am able to reliably reproduce with one of my datasets but I'm not sure if I could write a test to fail. I'm running on .NET Core/x64 with 4.8.0-beta00012.
   
   Similar stack trace to everyone after OP:
   ```
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32) 
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify) 
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance) 
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm) 
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init) 
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions) 
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts) 
      at Lucene.Net.Search.MultiTermQuery.RewriteMethod.GetTermsEnum(MultiTermQuery query, Terms terms, AttributeSource atts) 
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector) 
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query) 
      at Lucene.Net.Search.MultiTermQuery.Rewrite(IndexReader reader) 
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader) 
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original) 
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query) 
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n) 
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n) 
   ```
   Using this analyzer (I'm just starting to come up to speed with Lucene so I'm not sure the arrangement of filters actually makes any sense):
   ```c#
   public class NGramAnalyzer : Analyzer
   {
       private readonly LuceneVersion version;
       private readonly int minGram;
       private readonly int maxGram;
   
       public NGramAnalyzer(LuceneVersion version, int minGram = 2, int maxGram = 8)
       {
           this.version = version;
           this.minGram = minGram;
           this.maxGram = maxGram;
       }
   
       /// <inheritdoc />
       protected override TextReader InitReader(string fieldName, TextReader reader)
       {
           var charMap = new NormalizeCharMap.Builder();
           charMap.Add("_", " ");
           return new MappingCharFilter(charMap.Build(), reader);
       }
   
       /// <inheritdoc />
       protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
       {
           // Splits words at punctuation characters, removing punctuation.
           // Splits words at hyphens, unless there's a number in the token...
           // Recognizes email addresses and internet hostnames as one token.
           var tokenizer = new StandardTokenizer(version, reader);
   
           TokenStream filter = new StandardFilter(version, tokenizer);
   
           // Normalizes token text to lower case.
           filter = new LowerCaseFilter(version, filter);
   
           // Removes stop words from a token stream.
           filter = new StopFilter(version, filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
   
           filter = new EnglishMinimalStemFilter(filter);
   
           filter = new NGramTokenFilter(version, filter, minGram, maxGram);
           return new TokenStreamComponents(tokenizer, filter);
       }
   }
   ```
   
   Setup is then:
   
   ```c#
   var indexStore = new RAMDirectory();
   var indexConfig = new IndexWriterConfig(Version, Analyzer);
   indexWriter = new IndexWriter(indexStore, indexConfig);
   initialIndexingTask = Task.Run(() =>
                                                 {
                                                     var stopwatch = Stopwatch.StartNew();
                                                     indexWriter.AddDocuments(collection.Select(GetAndSubscribeToDocument));
                                                     indexWriter.Commit();
                                                     Debug.WriteLine(@$"{typeof(TDocument)} Indexing: {stopwatch.ElapsedMilliseconds}ms");
                                                 });
   ```
   
   Searching after initial indexing is complete is done with:
   
   ```c#
   using var reader = DirectoryReader.Open(indexWriter.Directory);
   var searcher = new IndexSearcher(reader);
   
   Query? parsedQuery;
   try
   {
       var queryParser = new MultiFieldQueryParser(Version, DefaultSearchFields, Analyzer);
       var terms = new HashSet<Term>();
       queryParser.Parse(query).Rewrite(reader).ExtractTerms(terms);
   
       var boolQuery = new BooleanQuery();
       terms.ForEach(t =>
                       {
                           boolQuery.Add(new FuzzyQuery(t), Occur.SHOULD);
                           boolQuery.Add(new WildcardQuery(t), Occur.SHOULD);
                       });
   
       parsedQuery = boolQuery;
   }
   catch (Exception)
   {
       // TODO: User feedback
       return new (TDocument doc, float score)[0];
   }
   
   var hits = searcher.Search(parsedQuery, resultLimit);
   ```
   
   I've archived off the dataset and code so that I can hopefully go back and gather more data to help troubleshoot. It's worth noting that in my current repro case, I have 4 separate instances of this (RAMDirectory, IndexWriter, and Reader+Searcher) all running at the same time (and with _nearly_ identical datasets). A quick look through the code up and down the stack trace didn't show me anything in Lucene that was obviously shared between those instances that could be the culprit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-722128836


   @willson556 
   
   Thanks for submitting a working failure case. I was able to use it to create a test project that contained a failing test. From there, I was able to confirm that the error was introduced between 4.8.0-beta00007 and 4.8.0-beta00008 and by using git's detached mode the issue was traced to commit https://github.com/apache/lucenenet/commit/0eaf76540b8de326d1aa9ca24f4b5d6425a9ae38. Unfortunately, I had to start all over again at that point, since it was a merge of 60 commits, but eventually I ended up here: https://github.com/apache/lucenenet/commit/e1ead061df6ab5371979040ae8071b1bf8b18070.
   
   It turned out to be a simple misinterpretation that `id` means "unique", when in fact the object reference was the unique identifier that should be used in the `Equals()` implementation.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 closed issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 closed issue #296:
URL: https://github.com/apache/lucenenet/issues/296


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-667715786


   Okay, I have fixed some culture sensitivity issues with the analyzers that could be leading to this. Could someone please check [the packages in the nuget artifact here](https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=1027&view=artifacts&type=publishedArtifacts) to see whether the `IndexOutOfRangeException` still exists?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-667973199


   I installed the nugt artifact locally and the error seems to not occur as often as before, but I'm still able to reproduce it:
   
   ```System.IndexOutOfRangeException: Index was outside the bounds of the array.
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n, Sort sort)```
   
   With the added retry functionality, I wasn't able to produce 2 errors in a row using the same FuzzyQuery.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-693447089


   If it helps any:
   I get same exception, with same StackTrace, when using WildcardQuery on a particular StringField (the field contains a string of ints). If I wrap the WildcardQuery in a single item BooleanQuery, I do not experience the issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-667973199


   I installed the nugt artifact locally and the error seems to not occur as often as before, but I'm still able to reproduce it:
   
   ```
   System.IndexOutOfRangeException: Index was outside the bounds of the array.
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n, Sort sort)
   ```
   
   With the added retry functionality, I wasn't able to produce the errors two times in a row using the same FuzzyQuery. So it seems to be nearly fixed, with only a small bug remaining


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-666504940


   Hi @NightOwl888,
   
   I can basically confirm the behavior described here when using FuzzyQuery, most of the times it works, but sometimes searches fail with a pretty similar exception tho: 
   
   `System.IndexOutOfRangeException: Index was outside the bounds of the array.
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n, Sort sort)`
   
   We are using Lucene 4.8. For now, we are "solving" this by using a try catch around the Search() and catch it to do retry of the search, which greatly reduces the amount of failed searches.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] AntonOttoW edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
AntonOttoW edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-721908527


   In my case, when doing a fuzzy search using 4.8.0-beta00012 and doing load testing I get the IndexOutOfRangeException with the following stack trace:
   
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.FilteredQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n, Sort sort)
   
   I'm running a thousand requests that is ramped up over 60 seconds. I then get an error rate of about 20 to 30 percent. 
   
   I then included a retry whenever I catch this exception and have brought the error rate down to 1 to 2 percent. (I don't count the errors in the retries and only the ones that didn't return success after 3 attempts)
   
   Interesting thing is, when I removed the fuzzy search, I was able to do a 1000 successful requests. No issues.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-693447089


   If it helps any:
   I get same exception, with same StackTrace, when using WildcardQuery on a particular StringField (the field contains a string of ints). If I wrap the WildcardQuery in a single item BooleanQuery, I do not experience the issue. This seems to happen when I add a StringField that is the Reverse of another. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-696300662


   Example (Note: before adding reversed fields, the issue does not present itself):
   
   Analyzer: KeywordAnalyzer
   
   Field Definitions:
   new StringField("Address", string.Empty, Field.Store.YES)
   new StringField("Address" + "_Reversed, string.Empty, Field.Store.YES)
   new StringField("Zip", string.Empty, Field.Store.YES)
   new StringField("Zip" + "_Reversed", string.Empty, Field.Store.YES)
   
   Query:
   var query = new BooleanQuery
    {
     { new WildcardQuery(new Term("Address", "*hwy*")), Occur.MUST },
     { new WildcardQuery(new Term("Zip", "*06*")), Occur.MUST },
     };
   
   indexSearcher.Search(query, 10)
   
   NOTE: If I name the "reversed" columns "_Reversed" + Name, the issue goes away.
   
   I apologize, I don't have the data to reproduce the exception any longer, as I rebuilt the index with a different name for the reversed columns, and the issue seems to have gone away, and to rebuild with the field names that were problematic takes a long time...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] willson556 edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
willson556 edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-709546507


   > If a test is too much to ask, could you distill this down to a console app using the failing data set and put it in a repo to share?
   
   Yeah, I should be able to get that to you by the end of the week. Thanks for the prompt response!
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] LunarExplorer removed a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
LunarExplorer removed a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-695352962


   I get fairly reproducible Lucene.Net.Facet FastTaxonomyFacetCounts System.IndexOutOfRangeException.
   
    Message: 
   exception: 
       System.IndexOutOfRangeException: Index was outside the bounds of the array.
     Stack Trace: 
       FastTaxonomyFacetCounts.Count(IList`1 matchingDocs)
       FastTaxonomyFacetCounts.ctor(String indexFieldName, TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc)
       FastTaxonomyFacetCounts.ctor(TaxonomyReader taxoReader, FacetsConfig config, FacetsCollector fc)
   
   I'm happy to provide further info or even hop on some kind of screen share to do debug interactively if that helps?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] jregnier commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
jregnier commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-665266574


   any ideas on this???


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] jregnier commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
jregnier commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-645421710


   Thanks for the quick response, I can't really supply the sample data since it could be many things. The data is very diverse. Hopefully, the breakdown of my setup will be enough.
   
   `analyzer uses a chartokenizer with a lowercase filter
   
   var dir = FSDirectory.Open(indexFolderPath);
   var indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, {analyzer});
   _writer = new IndexWriter(dir, indexConfig);
   
   var parentDocument = new Document();
   parentDocument.Add({BinaryDocValuesField});
   parentDocument.Add({StringField});
   parentDocument.Add({StringField});
   parentDocument.Add({StringField});
   
   var childDocument = new Document();
   childDocument.Add({StringField});
   childDocument.Add({StringField});
   childDocument.Add({TextField}) // not stored;
   childDocument.Add({StringField}) // only some documents will have this;
   
   // we are creating a parent child relationship with this list of documents
   _writer.AddDocuments(documentList)
   
   _reader = DirectoryReader.Open(FSDirectory.Open(indexFolderPath));
   _searcher = new IndexSearcher(_reader);
   BooleanQuery.MaxClauseCount = int.MaxValue;
   
   var searchString = "value:*test search string*"
   var terms = new SpanMultiTermQueryWrapper<WildcardQuery>(new WildcardQuery(new Term(fieldName, word)) // terms is a list of these for each word
   var childQuery = new SpanNearQuery(terms, 0, true)
   
   var parentFilter = new FixedBitSetCachingWrapperFilter(
   	new QueryWrapperFilter(
   		new TermQuery(
   			new Term(fieldName, value))));
   
   var query = ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.Max);
   
   var sort = new Sort(new SortField(null, SortFieldType.DOC));
   return _searcher.Search(query, _reader.NumDocs, sort)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668003921


   Thanks for the info. I suspect this is a different issue than the one from the OP.
   
   Can you tell me which version the error first appeared in? There have been some recent changes to Automaton to improve performance and I am sure it can be narrowed to a few suspect commits pretty easily.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-667973199


   I installed the nugt artifact locally and the error seems to not occur as often as before, but I'm still able to reproduce it:
   
   ```System.IndexOutOfRangeException: Index was outside the bounds of the array.
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n, Sort sort)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-645058841


   Thanks for the report. The stack trace is helpful as it indicates an index read failure, but could you provide more sample setup code? It would be helpful if you could provide the following:
   
   1. Which Lucene version compatibility setting you are using
   3. Sample code to create an index (including field/analyzer setup)
   4. Sample query code to read the index
   5. Some sample data
   
   It is much more likely we will solve this if we have code that can be run to duplicate the conditions at the time of the exception, either as a standalone console app or a test.
   
   I suspect there may be a mismatch between the `BlockTreeTermsWriter` and the `BlockTreeTermsReader`. It may be unrelated, but there is a comment in the code [in the `BlockTreeTermsWriter`](https://github.com/apache/lucenenet/blob/a7f7c40895b156681beaea22e1da8f46e265a98c/src/Lucene.Net/Codecs/BlockTreeTermsWriter.cs#L438-L440) that indicates an index out of range exception when asserting the "floor blocks" data. Floor blocks are used if you have more than 48 terms in a block.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-696300662






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] AntonOttoW commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
AntonOttoW commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-721908527


   In my case, when doing a fuzzy search and doing load testing I get the IndexOutOfRangeException with the following stack trace:
   
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32)
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify)
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance)
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm)
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init)
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions)
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts)
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector)
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.FilteredQuery.Rewrite(IndexReader reader)
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original)
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query)
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n, Sort sort)
   
   I'm running a thousand requests that is ramped up over 60 seconds. I then get an error rate of about 20 to 30 percent. 
   
   I then included a retry whenever I catch this exception and have brought the error rate down to 1 to 2 percent. (I don't count the errors in the retries and only the ones that didn't return success after 3 attempts)
   
   Interesting thing is, when I removed the fuzzy search, I was able to do a 1000 successful requests. No issues.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668003921


   Thanks for the info. I suspect this is a different issue than the one from the OP.
   
   Can you tell me which version the error first appeared in? There have been some recent changes to both `Automaton` and `FuzzyTermsEnum` to improve performance and I am sure it can be narrowed to a few suspect commits pretty easily.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-665507225


   I traced an issue that was causing another `IndexOutOfRangeException` in the `ThaiTokenizer` to an invalid cast from `int` to `char` that was causing it to filter out surrogate pairs when it shouldn't have been. This is the second such issue I found this week, and searching through the analyzers for the string `(char)`, this appears to be a problem that affects several of them. This is definitely a bug that we will need to address.
   
   It might also be useful to know whether the problem you are seeing is happening in all cultures. In Java, none of the methods are culture-sensitive, so to match the behavior we should be using the invariant culture. .NET has [several methods that are culture-sensitive by default](https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings). While we have gone through to ensure we are not calling any of them in places where we shouldn't be, there could be a case or two that were missed or were recently added. If you switch the current thread to the invariant culture, does it cause the problem to go away?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] jregnier commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
jregnier commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-670708435


   Sorry been off for a few days, I'm unfortunately not able to repro it on my side so I can't really test it out


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-696300662


   Platform = x64.
   Example (Note: before adding reversed fields, the issue does not present itself):
   
   Analyzer: KeywordAnalyzer
   
   Field Definitions:
   new StringField("Address", string.Empty, Field.Store.YES)
   new StringField("Address" + "_Reversed, string.Empty, Field.Store.YES)
   new StringField("Zip", string.Empty, Field.Store.YES)
   new StringField("Zip" + "_Reversed", string.Empty, Field.Store.YES)
   
   Query:
   var query = new BooleanQuery
    {
     { new WildcardQuery(new Term("Address", "*hwy*")), Occur.MUST },
     { new WildcardQuery(new Term("Zip", "*06*")), Occur.MUST },
     };
   
   indexSearcher.Search(query, 10)
   
   NOTE: If I name the "reversed" columns "_Reversed" + Name, the issue goes away.
   
   I apologize, I don't have the data to reproduce the exception any longer, as I rebuilt the index with a different name for the reversed columns, and the issue seems to have gone away, and to rebuild with the field names that were problematic takes a long time...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-722508693


   BTW - if anyone wants to try out these changes before they are rolled into a release to confirm it is a complete fix, the NuGet packages can be downloaded from the `nuget` artifact here: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=1171&view=artifacts&type=publishedArtifacts


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668009001


   We just recently implemented the FuzzyQuery on beta00008 and updated to beta00011, so the error could have happened in an earlier version. I will try a downgrade to an older version and check if the error still occurs and get back to you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] willson556 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
willson556 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-708676417


   I am able to reliably reproduce with one of my datasets but I'm not sure if I could write a test to fail. I'm running on .NET Core/x64.
   
   Similar stack trace to everyone after OP:
   ```
      at Lucene.Net.Util.Automaton.UTF32ToUTF8.Convert(Automaton utf32) 
      at Lucene.Net.Util.Automaton.CompiledAutomaton..ctor(Automaton automaton, Nullable`1 finite, Boolean simplify) 
      at Lucene.Net.Search.FuzzyTermsEnum.InitAutomata(Int32 maxDistance) 
      at Lucene.Net.Search.FuzzyTermsEnum.GetAutomatonEnum(Int32 editDistance, BytesRef lastTerm) 
      at Lucene.Net.Search.FuzzyTermsEnum.MaxEditDistanceChanged(BytesRef lastTerm, Int32 maxEdits, Boolean init) 
      at Lucene.Net.Search.FuzzyTermsEnum..ctor(Terms terms, AttributeSource atts, Term term, Single minSimilarity, Int32 prefixLength, Boolean transpositions) 
      at Lucene.Net.Search.FuzzyQuery.GetTermsEnum(Terms terms, AttributeSource atts) 
      at Lucene.Net.Search.MultiTermQuery.RewriteMethod.GetTermsEnum(MultiTermQuery query, Terms terms, AttributeSource atts) 
      at Lucene.Net.Search.TermCollectingRewrite`1.CollectTerms(IndexReader reader, MultiTermQuery query, TermCollector collector) 
      at Lucene.Net.Search.TopTermsRewrite`1.Rewrite(IndexReader reader, MultiTermQuery query) 
      at Lucene.Net.Search.MultiTermQuery.Rewrite(IndexReader reader) 
      at Lucene.Net.Search.BooleanQuery.Rewrite(IndexReader reader) 
      at Lucene.Net.Search.IndexSearcher.Rewrite(Query original) 
      at Lucene.Net.Search.IndexSearcher.CreateNormalizedWeight(Query query) 
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n) 
      at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n) 
   ```
   Using this analyzer (I'm just starting to come up to speed with Lucene so I'm not sure the arrangement of filters actually makes any sense):
   ```c#
   public class NGramAnalyzer : Analyzer
   {
       private readonly LuceneVersion version;
       private readonly int minGram;
       private readonly int maxGram;
   
       public NGramAnalyzer(LuceneVersion version, int minGram = 2, int maxGram = 8)
       {
           this.version = version;
           this.minGram = minGram;
           this.maxGram = maxGram;
       }
   
       /// <inheritdoc />
       protected override TextReader InitReader(string fieldName, TextReader reader)
       {
           var charMap = new NormalizeCharMap.Builder();
           charMap.Add("_", " ");
           return new MappingCharFilter(charMap.Build(), reader);
       }
   
       /// <inheritdoc />
       protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
       {
           // Splits words at punctuation characters, removing punctuation.
           // Splits words at hyphens, unless there's a number in the token...
           // Recognizes email addresses and internet hostnames as one token.
           var tokenizer = new StandardTokenizer(version, reader);
   
           TokenStream filter = new StandardFilter(version, tokenizer);
   
           // Normalizes token text to lower case.
           filter = new LowerCaseFilter(version, filter);
   
           // Removes stop words from a token stream.
           filter = new StopFilter(version, filter, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
   
           filter = new EnglishMinimalStemFilter(filter);
   
           filter = new NGramTokenFilter(version, filter, minGram, maxGram);
           return new TokenStreamComponents(tokenizer, filter);
       }
   }
   ```
   
   Setup is then:
   
   ```c#
   var indexStore = new RAMDirectory();
   var indexConfig = new IndexWriterConfig(Version, Analyzer);
   indexWriter = new IndexWriter(indexStore, indexConfig);
   initialIndexingTask = Task.Run(() =>
                                                 {
                                                     var stopwatch = Stopwatch.StartNew();
                                                     indexWriter.AddDocuments(collection.Select(GetAndSubscribeToDocument));
                                                     indexWriter.Commit();
                                                     Debug.WriteLine(@$"{typeof(TDocument)} Indexing: {stopwatch.ElapsedMilliseconds}ms");
                                                 });
   ```
   
   Searching after initial indexing is complete is done with:
   
   ```c#
   using var reader = DirectoryReader.Open(indexWriter.Directory);
   var searcher = new IndexSearcher(reader);
   
   Query? parsedQuery;
   try
   {
       var queryParser = new MultiFieldQueryParser(Version, DefaultSearchFields, Analyzer);
       var terms = new HashSet<Term>();
       queryParser.Parse(query).Rewrite(reader).ExtractTerms(terms);
   
       var boolQuery = new BooleanQuery();
       terms.ForEach(t =>
                       {
                           boolQuery.Add(new FuzzyQuery(t), Occur.SHOULD);
                           boolQuery.Add(new WildcardQuery(t), Occur.SHOULD);
                       });
   
       parsedQuery = boolQuery;
   }
   catch (Exception)
   {
       // TODO: User feedback
       return new (TDocument doc, float score)[0];
   }
   
   var hits = searcher.Search(parsedQuery, resultLimit);
   ```
   
   I've archived off the dataset and code so that I can hopefully go back and gather more data to help troubleshoot. It's worth noting that in my current repro case, I have 4 separate instances of this (RAMDirectory, IndexWriter, and Reader+Searcher) all running at the same time (and with _nearly_ identical datasets). A quick look through the code up and down the stack trace didn't show me anything in Lucene that was obviously shared between those instances that could be the culprit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] mlaufer commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
mlaufer commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-668034914


   I'm unable to reproduce the bug on beta00006, I will try beta00007 next. Hope this helps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] willson556 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
willson556 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-709671340


   @NightOwl888 Repo is posted and I just invited you to it. The console app prompts you to enter a query. The suggested query provided in the prompt fails nearly every time for me.
   
   Thanks again!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-696300662


   Example (Note: before adding reversed fields, the issue does not present itself):
   
   Analyzer: KeywordAnalyzer
   
   Field Definitions:
   new StringField("Address", string.Empty, Field.Store.YES)
   new StringField("Address" + "_Reversed, string.Empty, Field.Store.YES)
   new StringField("Zip", string.Empty, Field.Store.YES)
   new StringField("Zip" + "_Reversed", string.Empty, Field.Store.YES)
   
   Query:
   var query = new BooleanQuery
    {
     { new WildcardQuery(new Term("Address", "*hwy*")), Occur.MUST },
     { new WildcardQuery(new Term("Zip", "*06*")), Occur.MUST },
     };
   
   indexSearcher.Search(query, 10)
   
   NOTE: If I name the "reversed" columns "_Reversed" + Name, the issue goes away.
   
   I apologize, I don't have the data to reproduce the exception any longer, as I rebuilt the index with a different name for the reversed columns, and the issue seems to have gone away, and to rebuild with the field names that were problematic takes a long time...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] willson556 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
willson556 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-709546507


   > @willson556
   > If a test is too much to ask, could you distill this down to a console app using the failing data set and put it in a repo to share?
   
   Yeah, I should be able to get that to you by the end of the week. Thanks for the prompt response!
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-671915691


   @mlaufer - Since you are able to reliably reproduce this, is it possible you can submit a PR with a test that fails  (no matter how rarely) with this problem?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] thedugas edited a comment on issue #296: IndexOutOfRangeException when searching

Posted by GitBox <gi...@apache.org>.
thedugas edited a comment on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-696300662


   Platform = x64.
   Build = 4.8.0-beta00012
   Example (Note: before adding reversed fields, the issue does not present itself):
   
   Analyzer: KeywordAnalyzer
   
   Field Definitions:
   new StringField("Address", string.Empty, Field.Store.YES)
   new StringField("Address" + "_Reversed, string.Empty, Field.Store.YES)
   new StringField("Zip", string.Empty, Field.Store.YES)
   new StringField("Zip" + "_Reversed", string.Empty, Field.Store.YES)
   
   Query:
   var query = new BooleanQuery
    {
     { new WildcardQuery(new Term("Address", "*hwy*")), Occur.MUST },
     { new WildcardQuery(new Term("Zip", "*06*")), Occur.MUST },
     };
   
   indexSearcher.Search(query, 10)
   
   NOTE: If I name the "reversed" columns "_Reversed" + Name, the issue goes away.
   
   I apologize, I don't have the data to reproduce the exception any longer, as I rebuilt the index with a different name for the reversed columns, and the issue seems to have gone away, and to rebuild with the field names that were problematic takes a long time...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org