You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ferenczi Jim (JIRA)" <ji...@apache.org> on 2016/11/23 19:03:58 UTC

[jira] [Commented] (LUCENE-7569) TestIndexSorting failures

    [ https://issues.apache.org/jira/browse/LUCENE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691060#comment-15691060 ] 

Ferenczi Jim commented on LUCENE-7569:
--------------------------------------

Thanks [~sarowe@syr.edu]. We've investigated this with [~jpountz] and found the issue. The index sorting during the merge uses a SortingLeafReader which uses a cache per thread for the doc value readers. This breaks the merging of SortedSetDocValues and SortedNumericDocValues since we need to iterate two instances in parallel of these doc values during the merge (see DocValuesConsumer.mergeSortedNumericField). [~jpountz]'s idea to fix this bug is to rewrite the SortingLeafReader to a SortingCodecReader. The SortingCodecReader would never use the same instance when creating a DocValues reader. The bug is only in 6x since in master the doc value readers are now iterators that are re-created when getDocValues is called.
The second issue about already sorted index is just a test bug that appears when the random merge policy picks a merge factor of 2. 
I'll send a patch for these two issues shortly. The first issue is problematic though because it means that indices that use index sorting (on any field, multivalued or not) in 6x are not able to merge multivalued doc values properly. 

> TestIndexSorting failures
> -------------------------
>
>                 Key: LUCENE-7569
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7569
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Steve Rowe
>            Assignee: Michael McCandless
>            Priority: Blocker
>             Fix For: master (7.0), 6.4
>
>
> My Jenkins found two reproducing seeds on branch_6x - these look different, but the failures happened on consecutive nightly runs:
> {noformat}
> Checking out Revision 535bf59a3b239f5c7bcd8c00f3e452c9b5e9b539 (refs/remotes/origin/branch_6x)
> [...]
>   [junit4] Suite: org.apache.lucene.index.TestIndexSorting
>    [junit4]   2> ??? 18, 2016 9:50:39 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
>    [junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #0,5,TGRP-TestIndexSorting]
>    [junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError: nextValue=4594289799775307848 vs previous=4606302611760746829
>    [junit4]   2>        at __randomizedtesting.SeedInfo.seed([5F8898DCABBFD056]:0)
>    [junit4]   2>        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
>    [junit4]   2>        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
>    [junit4]   2> Caused by: java.lang.AssertionError: nextValue=4594289799775307848 vs previous=4606302611760746829
>    [junit4]   2>        at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat$AssertingDocValuesConsumer.addSortedNumericField(AssertingDocValuesFormat.java:152)
>    [junit4]   2>        at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
>    [junit4]   2>        at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
>    [junit4]   2>        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
>    [junit4]   2>        at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
>    [junit4]   2>        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
>    [junit4]   2>        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4320)
>    [junit4]   2>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3897)
>    [junit4]   2>        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>    [junit4]   2>        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
>    [junit4]   2> 
>    [junit4]   2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexSorting -Dtests.method=testRandom3 -Dtests.seed=5F8898DCABBFD056 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=he -Dtests.timezone=Canada/Central -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
>    [junit4] ERROR   14.9s J5  | TestIndexSorting.testRandom3 <<<
>    [junit4]    > Throwable #1: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:748)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:762)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1566)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1315)
>    [junit4]    >        at org.apache.lucene.index.TestIndexSorting.testRandom3(TestIndexSorting.java:2019)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]    > Caused by: java.lang.AssertionError: nextValue=4594289799775307848 vs previous=4606302611760746829
>    [junit4]    >        at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat$AssertingDocValuesConsumer.addSortedNumericField(AssertingDocValuesFormat.java:152)
>    [junit4]    >        at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
>    [junit4]    >        at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
>    [junit4]    >        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
>    [junit4]    >        at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
>    [junit4]    >        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4320)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3897)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1115, name=Lucene Merge Thread #0, state=RUNNABLE, group=TGRP-TestIndexSorting]
>    [junit4]    > Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError: nextValue=4594289799775307848 vs previous=4606302611760746829
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([5F8898DCABBFD056]:0)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
>    [junit4]    > Caused by: java.lang.AssertionError: nextValue=4594289799775307848 vs previous=4606302611760746829
>    [junit4]    >        at org.apache.lucene.codecs.asserting.AssertingDocValuesFormat$AssertingDocValuesConsumer.addSortedNumericField(AssertingDocValuesFormat.java:152)
>    [junit4]    >        at org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:470)
>    [junit4]    >        at org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:243)
>    [junit4]    >        at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:153)
>    [junit4]    >        at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:167)
>    [junit4]    >        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4320)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3897)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
>    [junit4]    >        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
>    [junit4]   2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-Nightly-6.x/workspace/lucene/build/core/test/J5/temp/lucene.index.TestIndexSorting_5F8898DCABBFD056-001
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {docs=PostingsFormat(name=Direct), norms=PostingsFormat(name=Direct), positions=PostingsFormat(name=Memory doPackFST= false), id=PostingsFormat(name=Direct), term_vectors=PostingsFormat(name=LuceneFixedGap)}, docValues:{multi_valued_long=DocValuesFormat(name=Lucene54), double=DocValuesFormat(name=Direct), foo=DocValuesFormat(name=Lucene54), numeric=DocValuesFormat(name=Direct), positions=DocValuesFormat(name=Lucene54), multi_valued_numeric=DocValuesFormat(name=Asserting), float=DocValuesFormat(name=Memory), int=DocValuesFormat(name=Asserting), long=DocValuesFormat(name=Memory), points=DocValuesFormat(name=Asserting), sorted=DocValuesFormat(name=Direct), multi_valued_double=DocValuesFormat(name=Asserting), docs=DocValuesFormat(name=Asserting), multi_valued_string=DocValuesFormat(name=Asserting), norms=DocValuesFormat(name=Asserting), bytes=DocValuesFormat(name=Asserting), binary=DocValuesFormat(name=Direct), id=DocValuesFormat(name=Asserting), multi_valued_int=DocValuesFormat(name=Direct), multi_valued_bytes=DocValuesFormat(name=Direct), multi_valued_float=DocValuesFormat(name=Memory), term_vectors=DocValuesFormat(name=Direct)}, maxPointsInLeafNode=1312, maxMBSortInHeap=5.376962888308618, sim=RandomSimilarity(queryNorm=true,coord=yes): {positions=DFR I(ne)3(800.0), id=IB LL-L3(800.0), term_vectors=DFR I(ne)B3(800.0)}, locale=he, timezone=Canada/Central
>    [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=132261744,total=508559360
>    [junit4]   2> NOTE: All tests run in this JVM: [TestFixedBitSet, TestSearchAfter, TestMinShouldMatch2, TestStopFilter, TestMmapDirectory, TestAutomaton, TestTransactions, TestMultiLevelSkipList, TestForUtil, TestFieldType, TestIntBlockPool, TestNumericRangeQuery64, TestBoolean2, Test2BPositions, TestDemo, TestSpanBoostQuery, TestSpanMultiTermQueryWrapper, TestCharsRefBuilder, TestIOUtils, TestAllFilesHaveChecksumFooter, TestMaxTermFrequency, TestConjunctionDISI, TestIndexWriterMerging, TestPackedInts, TestLucene50StoredFieldsFormat, TestGeoEncodingUtils, TestGeoUtils, Test2BPoints, Test2BSortedDocValuesFixedSorted, TestDocValues, TestExceedMaxTermLength, TestExitableDirectoryReader, TestFilterDirectoryReader, TestIndexSorting]
>    [junit4] Completed [345/441 (1!)] on J5 in 39.46s, 43 tests, 1 error <<< FAILURES!
> {noformat}
> {noformat}
> Checking out Revision 500f6c7875be31c34ca68c58f850b7e64fd049a9 (refs/remotes/origin/branch_6x)
> [...]
>    [junit4] Suite: org.apache.lucene.index.TestIndexSorting
>    [junit4]   2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexSorting -Dtests.method=testRandom3 -Dtests.seed=5D1E2D97B0777D5C -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=fi-FI -Dtests.timezone=America/Tegucigalpa -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR    245s J3  | TestIndexSorting.testRandom3 <<<
>    [junit4]    > Throwable #1: java.lang.RuntimeException: segment has indexSort=<float: "float">! missingValue=0.38226813,<sortedset: "multi_valued_bytes">! selector=MIN,<int: "id"> but docID=75 sorts after docID=76
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([5D1E2D97B0777D5C:FFC6634DD485545A]:0)
>    [junit4]    >        at org.apache.lucene.index.CheckIndex.testSort(CheckIndex.java:876)
>    [junit4]    >        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:756)
>    [junit4]    >        at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:300)
>    [junit4]    >        at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:865)
>    [junit4]    >        at org.apache.lucene.util.IOUtils.close(IOUtils.java:89)
>    [junit4]    >        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76)
>    [junit4]    >        at org.apache.lucene.index.TestIndexSorting.testRandom3(TestIndexSorting.java:2087)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]   2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-Nightly-6.x/workspace/lucene/build/core/test/J3/temp/lucene.index.TestIndexSorting_5D1E2D97B0777D5C-001
>    [junit4]   2> NOTE: test params are: codec=CheapBastard, sim=RandomSimilarity(queryNorm=true,coord=no): {positions=DFR I(F)BZ(0.3), id=IB SPL-DZ(0.3), term_vectors=DFR I(ne)3(800.0)}, locale=fi-FI, timezone=America/Tegucigalpa
>    [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=252779544,total=525336576
>    [junit4]   2> NOTE: All tests run in this JVM: [TestBooleanRewrites, TestSearchForDuplicates, TestSwappedIndexFiles, TestClassicSimilarity, TestNeverDelete, TestAllFilesCheckIndexHeader, TestBooleanScorer, TestPayloads, TestDelegatingAnalyzerWrapper, TestWeakIdentityMap, TestRollingBuffer, TestParallelCompositeReader, TestExitableDirectoryReader, TestCollectionUtil, TestLegacyNumericUtils, TestBagOfPostings, TestNot, TestPhrasePrefixQuery, TestSimpleExplanationsWithFillerDocs, TestPriorityQueue, TestBlockPostingsFormat2, TestIntroSorter, FiniteStringsIteratorTest, TestIndexWriterReader, TestNumericRangeQuery64, TestDeletionPolicy, TestBooleanOr, TestSloppyPhraseQuery, TestNRTThreads, TestPolygon, TestPolygon2D, TestDocValues, TestFilterDirectoryReader, TestIndexSorting]
>    [junit4] Completed [431/441 (1!)] on J3 in 263.65s, 43 tests, 1 error <<< FAILURES!
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org