You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Simon Willnauer (Jira)" <ji...@apache.org> on 2020/08/24 18:24:00 UTC

[jira] [Resolved] (LUCENE-9477) IndexWriter might leave broken segments file behind on exception during rollback

     [ https://issues.apache.org/jira/browse/LUCENE-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-9477.
-------------------------------------
    Fix Version/s: 8.7
                   master (9.0)
       Resolution: Fixed

> IndexWriter might leave broken segments file behind on exception during rollback
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-9477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9477
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Simon Willnauer
>            Priority: Major
>             Fix For: master (9.0), 8.7
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Mike ran some beasty tests while I was working on LUCENE-8962. This test caused some headaches since it only rarely also fails on master:
> {noformat}
> org.apache.lucene.index.TestIndexWriterOnVMError > testUnknownError FAILED
>     org.apache.lucene.index.CorruptIndexException: Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(MockIndexInputWrapper((clone of) ByteBuffersIndexInput (file=pending_segments_2, buffers\
> =258 bytes, block size: 1, blocks: 1, position: 0))))
>         at __randomizedtesting.SeedInfo.seed([587A104EFE0C57E1:B32CCFCEFC8BC1D1]:0)
>         at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:300)
>         at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:521)
>         at org.apache.lucene.util.TestUtil.checkIndex(TestUtil.java:301)
>         at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:836)
>         at org.apache.lucene.index.TestIndexWriterOnVMError.doTest(TestIndexWriterOnVMError.java:89)
>         at org.apache.lucene.index.TestIndexWriterOnVMError.testUnknownError(TestIndexWriterOnVMError.java:251)
>         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
>         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
>         at java.base/java.lang.Thread.run(Thread.java:834)
>         Caused by:
>         java.io.FileNotFoundException: _0.si in dir=ByteBuffersDirectory@1bae3fe1 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@38275f41
>             at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:748)
>             at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
>             at org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1044)
>             at org.apache.lucene.codecs.lucene86.Lucene86SegmentInfoFormat.read(Lucene86SegmentInfoFormat.java:91)
>             at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:364)
>             at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:298)
>             ... 41 more
>         ....
>   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.linedocsfile=/l/sim\
> on/lucene/test-framework/src/resources/org/apache/lucene/util/2000mb.txt.gz -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>   2> NOTE: leaving temporary files on disk at: /l/simon/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnVMError_587A104EFE0C57E1-003
>   2> NOTE: test params are: codec=Asserting(Lucene86): {text_payloads=BlockTreeOrds(blocksize=128), text_vectors=PostingsFormat(name=Asserting), text1=PostingsFormat(name=Asserting), id=BlockTreeOrds(blocksize=128)}, docValu\
> es:{dv3=DocValuesFormat(name=Lucene80), dv2=DocValuesFormat(name=Asserting), dv5=DocValuesFormat(name=Lucene80), dv=DocValuesFormat(name=Asserting), dv4=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=696, maxMBSortInH\
> eap=6.040673619645681, sim=Asserting(RandomSimilarity(queryNorm=false): {text_payloads=IB SPL-DZ(0.3), text_vectors=DFR I(ne)L3(800.0), text1=org.apache.lucene.search.similarities.BooleanSimilarity@6f4329a1}), locale=zh-CN, \
> timezone=SystemV/MST7MDT
>   2> NOTE: Linux 5.5.6-arch1-1 amd64/Oracle Corporation 11.0.6 (64-bit)/cpus=128,threads=1,free=241525696,total=268435456
>   2> NOTE: All tests run in this JVM: [TestIndexWriterOnVMError]
> {noformat}
> The test reproduces on master also without the huge line docs file using this:
> {noformat}
> ant test  -Dtestcase=TestIndexWriterOnVMError -Dtests.method=testUnknownError -Dtests.seed=587A104EFE0C57E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=zh-CN -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> the reason is that we fail to delete the already renamed pending segments file when the metadata sync on the directory fails. The subsequent rollback also crashes while it's trying to delete unrefed files and that will cause subsequent CheckIndex calls to fail with FNF exceptions since the commit was written but not fully removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org