You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Bernardo Botella Corbi (Jira)" <ji...@apache.org> on 2022/02/17 17:26:00 UTC

[jira] [Commented] (CASSANDRA-17335) Flaky testNoSuchRepairSessionAnticompaction

    [ https://issues.apache.org/jira/browse/CASSANDRA-17335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494099#comment-17494099 ] 

Bernardo Botella Corbi commented on CASSANDRA-17335:
----------------------------------------------------

Found a fix for this issue. Can be found here:

[https://github.com/apache/cassandra/compare/trunk...bbotella:CASSANDRA-17335-trunk]

 

As a backup, I am also attaching a patch.

[^0001-Fix-Flaky-testNoSuchRepairSessionAnticompaction-trunk.patch]

Problem is that sometimes the session state value changed to FAILED state between the moment it was checked as non failed and the moment it was being updated, leading to an ilegal transition (from FAILED to PREPARED).

The way I was able to repro it was:
* to tell intelliJ to keep running the same test until it failed. It failed always after between 2 and 6 runs. 
* Running a simple bash script until it failed. Same results.

That allowed me to investigate the logs and chase that. After adding the synchronize, I couldn’t make the test fail again. Also, thinking about it, it makes perfect sense to put that block into a synchronized block, as it is reading a variable that is being updated from other threads.

> Flaky testNoSuchRepairSessionAnticompaction
> -------------------------------------------
>
>                 Key: CASSANDRA-17335
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17335
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/java
>            Reporter: Andres de la Peña
>            Priority: Normal
>         Attachments: 0001-Fix-Flaky-testNoSuchRepairSessionAnticompaction-trunk.patch
>
>
> The in-JVM dtest {{RepairErrorsTest#testNoSuchRepairSessionAnticompaction}} seems to be flaky, as it's shown by [this repeated run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1280/workflows/8a4e04cb-64cc-46a3-9e1e-c946dfafc7fa/jobs/12114] on trunk, which hits 18 failures in 500 iterations. The config for CircleCI was generated with:
> {code}
> .circleci/generate.sh -m \
>   -e REPEATED_UTEST_TARGET=test-jvm-dtest-some \
>   -e REPEATED_UTEST_COUNT=500 \
>   -e REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.RepairErrorsTest
> {code}
> This was discovered while testing CASSANDRA-16878, on [this CI run|https://app.circleci.com/pipelines/github/adelapena/cassandra/1268/workflows/aef1c703-c816-40f8-8e07-9055027d6403/jobs/12000].
> The error consists on a failed assertion when grepping the logs in search of an error message.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org