You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yifan Cai (Jira)" <ji...@apache.org> on 2020/05/11 23:18:00 UTC

[jira] [Comment Edited] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest

    [ https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104880#comment-17104880 ] 

Yifan Cai edited comment on CASSANDRA-15797 at 5/11/20, 11:17 PM:
------------------------------------------------------------------

PR: https://github.com/apache/cassandra/pull/588
Code: https://github.com/yifan-c/cassandra/tree/CASSANDRA-15797-Flaky-BinLogTest
Test: https://app.circleci.com/pipelines/github/yifan-c/cassandra?branch=CASSANDRA-15797-Flaky-BinLogTest

_update: unit and jvm dtest passed in java 8 and java 11_

*Failed tests*
* BinLogTest.testPutAfterStop: there is a chance that at the time of assertion, the consumer thread in BinLog has not yet drain the queue, so that assertion get the last NO_OP object and fail. The behavior is expected. The fix to the test is to assert none of the records get from the queue is the one inserted after stopping the binLog.
* BinLogTest.testBinLogStartStop: there is no barrier in the test to block assertion until records in the queue are consumed. Added the coundownlatch as the barrier. 

*Chronicle reference counter history trace*
The trace indicates the last release operation sees the reference counter is already 0, so that it prints the history. It is caused by the resources has already been released by the try-with-resources statement. And according to StoreComponentReferenceHandler#processWireQueue, *the exception can be ignored as the resources has already been released*. In this case, all test that calls readBinLogRecords can see the reference counter history.
Why the history trace is not always printed every time? Because test already ended and there is no time to print it. If adding a sleep time (5 seconds) at the end of the test, it is guaranteed to print reference count history. 


was (Author: yifanc):
PR: https://github.com/apache/cassandra/pull/588
Code: https://github.com/yifan-c/cassandra/tree/CASSANDRA-15797-Flaky-BinLogTest
Test: https://app.circleci.com/pipelines/github/yifan-c/cassandra?branch=CASSANDRA-15797-Flaky-BinLogTest

*Failed tests*
* BinLogTest.testPutAfterStop: there is a chance that at the time of assertion, the consumer thread in BinLog has not yet drain the queue, so that assertion get the last NO_OP object and fail. The behavior is expected. The fix to the test is to assert none of the records get from the queue is the one inserted after stopping the binLog.
* BinLogTest.testBinLogStartStop: there is no barrier in the test to block assertion until records in the queue are consumed. Added the coundownlatch as the barrier. 

*Chronicle reference counter history trace*
The trace indicates the last release operation sees the reference counter is already 0, so that it prints the history. It is caused by the resources has already been released by the try-with-resources statement. And according to StoreComponentReferenceHandler#processWireQueue, *the exception can be ignored as the resources has already been released*. In this case, all test that calls readBinLogRecords can see the reference counter history.
Why the history trace is not always printed every time? Because test already ended and there is no time to print it. If adding a sleep time (5 seconds) at the end of the test, it is guaranteed to print reference count history. 

> Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-15797
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15797
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Jon Meredith
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>
> An internal CI system is failing BinLogTest somewhat frequently under JDK11.  Configuration was recently changed to reduce the number of cores the tests run with, however it is reproducible on an 8 core laptop.
> {code}
> [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [Junit-timeout] WARNING: An illegal reflective access operation has occurred
> [junit-timeout] WARNING: Illegal reflective access by net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to field java.nio.Bits.RESERVED_MEMORY
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec
> [junit-timeout]
> [junit-timeout] Testcase: testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest):	FAILED
> [junit-timeout] expected:<null> but was:<or...@5ed190be>
> [junit-timeout] junit.framework.AssertionFailedError: expected:<null> but was:<or...@5ed190be>
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431)
> [junit-timeout]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit-timeout]         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [junit-timeout]         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> There's also a different failure under JDK8
> {code}
> junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec
> [junit-timeout]
> [junit-timeout] Testcase: testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest):	FAILED
> [junit-timeout] expected:<2> but was:<0>
> [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0>
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> Reproducer
> {code}
> PASSED=0; time  { while ant testclasslist -Dtest.classlistprefix=unit -Dtest.classlistfile=<(echo org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); echo PASSED $PASSED; done }; echo FAILED after $PASSED runs.
> {code}
> In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively under JDK11 and took 51 under JDK8 (about 15 minutes).
> I have not tried running in a cpu-limited container or anything like that yet.
> Additionally, this went past in the logs a few times (under JDK11).  No idea if it's just an artifact of weird test setup, or something more serious.
> {code}
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec
> [junit-timeout]
> [junit-timeout] java.lang.Throwable: 1e53135d-main creation ref-count=1
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.newRefCountHistory(ReferenceCounter.java:45)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.<init>(ReferenceCounter.java:35)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.onReleased(ReferenceCounter.java:40)
> [junit-timeout]         at net.openhft.chronicle.bytes.AbstractBytes.<init>(AbstractBytes.java:38)
> [junit-timeout]         at net.openhft.chronicle.bytes.MappedBytes.<init>(MappedBytes.java:59)
> [junit-timeout]         at net.openhft.chronicle.bytes.MappedBytes.<init>(MappedBytes.java:55)
> [junit-timeout]         at net.openhft.chronicle.bytes.MappedBytes.mappedBytes(MappedBytes.java:113)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.bytes(SingleChronicleQueueStore.java:358)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.resetWires(SingleChronicleQueueExcerpts.java:1702)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.cycle(SingleChronicleQueueExcerpts.java:1907)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.toStart(SingleChronicleQueueExcerpts.java:1614)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.createTailer(SingleChronicleQueue.java:401)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:495)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Reserve ref-count=2
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.recordResevation(ReferenceCounter.java:66)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.reserve(ReferenceCounter.java:59)
> [junit-timeout]         at net.openhft.chronicle.bytes.AbstractBytes.reserve(AbstractBytes.java:464)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounted.tryReserve(ReferenceCounted.java:81)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.writeEOF(SingleChronicleQueueStore.java:461)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.checkMoveToNextCycle(SingleChronicleQueueExcerpts.java:1437)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycleNone(SingleChronicleQueueExcerpts.java:1416)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycle(SingleChronicleQueueExcerpts.java:1356)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:1168)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readDocument(SingleChronicleQueueExcerpts.java:1096)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:498)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Release ref-count=1
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:79)
> [junit-timeout]         at net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.writeEOF(SingleChronicleQueueStore.java:463)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.checkMoveToNextCycle(SingleChronicleQueueExcerpts.java:1437)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycleNone(SingleChronicleQueueExcerpts.java:1416)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycle(SingleChronicleQueueExcerpts.java:1356)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:1168)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readDocument(SingleChronicleQueueExcerpts.java:1096)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:498)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Release ref-count=0
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:79)
> [junit-timeout]         at net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.close(SingleChronicleQueueExcerpts.java:1117)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.lambda$close$4(SingleChronicleQueue.java:537)
> [junit-timeout]         at java.base/java.util.WeakHashMap.forEach(WeakHashMap.java:1026)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.close(SingleChronicleQueue.java:537)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:506)
> [junit-timeout]         at org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main/queue-thread-local-cleaner-daemon Release ref-count=0
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:74)
> [junit-timeout]         at net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.StoreComponentReferenceHandler.processWireQueue(StoreComponentReferenceHandler.java:74)
> [junit-timeout]         at net.openhft.chronicle.queue.impl.single.StoreComponentReferenceHandler.lambda$static$0(StoreComponentReferenceHandler.java:43)
> [junit-timeout]         at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> [junit-timeout]         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> [junit-timeout]         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [junit-timeout]         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [junit-timeout]         at java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org