You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2021/10/07 10:21:00 UTC

[jira] [Comment Edited] (CASSANDRA-16562) Fix flaky testSkipScrubCorruptedCounterRowWithTool

    [ https://issues.apache.org/jira/browse/CASSANDRA-16562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425437#comment-17425437 ] 

Andres de la Peña edited comment on CASSANDRA-16562 at 10/7/21, 10:20 AM:
--------------------------------------------------------------------------

It seems that the failures in {{ScrubTest}} are still there and the patches still apply after an easy rebase. The test failures only appear when compression is used. The fix used in CASSANDRA-16532 is able to fix the occasional failures in 3.11, as it's shown why these multiplexer runs:
||PR||CI||
|[16562-3.0|https://github.com/apache/cassandra/pull/1253]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/966/workflows/ecefeef6-16b3-436c-b62f-8b76649dd864] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/975/workflows/38f961fc-d98d-428e-93f5-2cdc7870b4f0]|
|[16562-3.11|https://github.com/apache/cassandra/pull/1254]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/965/workflows/eed40245-98e2-4afc-b7a0-8bc93f819006] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/974/workflows/cec11724-773a-453d-a3d2-acb28de7fdce]|
|[cassandra-3.0|https://github.com/apache/cassandra/tree/cassandra-3.0]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/966/workflows/ecefeef6-16b3-436c-b62f-8b76649dd864] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/975/workflows/38f961fc-d98d-428e-93f5-2cdc7870b4f0]|
|[cassandra-3.11|https://github.com/apache/cassandra/tree/cassandra-3.11]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/965/workflows/eed40245-98e2-4afc-b7a0-8bc93f819006] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/971/workflows/73804303-470a-4b15-8a0a-a5bde2794471]|

However, the situation is different for 3.0. In that case, {{testScrubCorruptedCounterRow}} has been [consistently failing|https://ci-cassandra.apache.org/view/Cassandra%203.0/job/Cassandra-3.0-test-compression/lastBuild/jdk=jdk_1.8_latest,label=cassandra,split=2/testReport/org.apache.cassandra.db/ScrubTest/testScrubCorruptedCounterRow/] for a while, and the fix from #16532 doesn't change that. It seems that the test somehow gets stuck in a deadlock and it ends up failing with a timeout.

Bisect shows that this test has been failing since CASSANDRA-14284, and indeed the test [passes|https://app.circleci.com/pipelines/github/adelapena/cassandra/976/workflows/a1e4ad08-c517-4a6b-a130-14f210af1545] if we temporarily revert the changes in {{CompressedRandomAccessReader}} done by that ticket. I'm still trying to figure out how verifying the checksum before uncompressing is causing the deadlock.

CC [~blerer]


was (Author: adelapena):
It seems that the failures in {{ScrubTest}} are still there and the patches still apply after an easy rebase. The test failures only appear when compression is used. The fix used in CASSANDRA-16532 is able to fix the occasional failures in 3.11, as it's shown why these multiplexer runs:
||PR||CI||
|[16562-3.0|https://github.com/apache/cassandra/pull/1253]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/966/workflows/ecefeef6-16b3-436c-b62f-8b76649dd864] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/975/workflows/38f961fc-d98d-428e-93f5-2cdc7870b4f0]|
|[16562-3.11|https://github.com/apache/cassandra/pull/1254]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/965/workflows/eed40245-98e2-4afc-b7a0-8bc93f819006] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/974/workflows/cec11724-773a-453d-a3d2-acb28de7fdce]|
|[cassandra-3.0|https://github.com/apache/cassandra/tree/cassandra-3.0]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/966/workflows/ecefeef6-16b3-436c-b62f-8b76649dd864] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/975/workflows/38f961fc-d98d-428e-93f5-2cdc7870b4f0]|
|[cassandra-3.11|https://github.com/apache/cassandra/tree/cassandra-3.11]|[testsome|https://app.circleci.com/pipelines/github/adelapena/cassandra/965/workflows/eed40245-98e2-4afc-b7a0-8bc93f819006] [test-compression|https://app.circleci.com/pipelines/github/adelapena/cassandra/971/workflows/73804303-470a-4b15-8a0a-a5bde2794471]|

However, the situation is different for 3.0. In that case, {{testScrubCorruptedCounterRow}} has been [consistently failing|https://ci-cassandra.apache.org/view/Cassandra%203.0/job/Cassandra-3.0-test-compression/lastBuild/jdk=jdk_1.8_latest,label=cassandra,split=2/testReport/org.apache.cassandra.db/ScrubTest/testScrubCorruptedCounterRow/] for a while, and the fix from #16532 doesn't change that. It seems that the test somehow gets stuck in a deadlock and it ends up failing with a timeout.

Bisect shows that this test has been failing since CASSANDRA-14284, and indeed the test passes if we revert the changes in {{CompressedRandomAccessReader}} done by that ticket. I'm still trying to figure out how verifying the checksum before uncompressing is causing the deadlock.

CC [~blerer]

> Fix flaky testSkipScrubCorruptedCounterRowWithTool
> --------------------------------------------------
>
>                 Key: CASSANDRA-16562
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16562
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Berenguer Blasi
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x
>
>
> See CASSANDRA-16532 where extra flakiness was detected on 3.0 and 3.11 branches for {{testSkipScrubCorruptedCounterRowWithTool}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org