You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Jira)" <ji...@apache.org> on 2022/02/10 20:25:00 UTC

[jira] [Commented] (CASSANDRA-17312) dtest-large.replace_address_test.TestReplaceAddress.test_restart_failed_replace (from Cassandra dtests)

    [ https://issues.apache.org/jira/browse/CASSANDRA-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490492#comment-17490492 ] 

Brandon Williams commented on CASSANDRA-17312:
----------------------------------------------

The issue here is that we simply don't wait long enough for 3.0 under the circumstances of the test any longer.

I suspect when [this delay|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L970] was added this test began failing, since combined with the [sleep a bit later|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L1007] it's just too long for ccm's default on wait_for_binary_proto of 90 seconds.

3.11 is also dangerously close to exceeding it after that addition, so the simplest thing to do is increase it to a safe amount, which I did [here|https://github.com/driftx/cassandra-dtest/tree/CASSANDRA-17312].  And [here's a circle run|https://app.circleci.com/pipelines/github/driftx/cassandra/354/workflows/3822627d-f02a-4eca-b870-997b7818ee95] with it passing repeatedly on 3.0.

> dtest-large.replace_address_test.TestReplaceAddress.test_restart_failed_replace (from Cassandra dtests)
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17312
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17312
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Josh McKenzie
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 3.0.x
>
>
> Consistently failing on 3.0.x
> https://ci-cassandra.apache.org/job/Cassandra-3.0/240/testReport/dtest-large.replace_address_test/TestReplaceAddress/test_restart_failed_replace_2/
> Failed 8 times in the last 16 runs. Flakiness: 73%, Stability: 50%
> Error Message
> ccmlib.node.TimeoutError: 26 Jan 2022 23:07:02 [replacement] after 90.12/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:  Head: INFO  [main] 2022-01-26 23:04:33,906 YamlConfigura  Tail: ...endingRangeCalculator:1] 2022-01-26 23:06:41,472 TokenMetadata.java:226 - Token -3193255413308472407 changing ownership from /127.0.0.3 to /127.0.0.4
> {code}
> Stacktrace
> self = <replace_address_test.TestReplaceAddress object at 0x7f99546197c0>
>     @since('2.2')
>     @pytest.mark.resource_intensive
>     def test_restart_failed_replace(self):
>         """
>             Test that if a node fails to replace, it can join the cluster even if the data is wiped.
>             """
> >       self._test_restart_failed_replace(mode='wipe')
> replace_address_test.py:479: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> replace_address_test.py:539: in _test_restart_failed_replace
>     self.replacement_node.start(jvm_args=["-Dcassandra.replace_address_first_boot={}"
> ../venv/lib/python3.8/site-packages/ccmlib/node.py:901: in start
>     self.wait_for_binary_interface(from_mark=self.mark)
> ../venv/lib/python3.8/site-packages/ccmlib/node.py:689: in wait_for_binary_interface
>     self.watch_log_for("Starting listening for CQL clients", **kwargs)
> ../venv/lib/python3.8/site-packages/ccmlib/node.py:588: in watch_log_for
>     TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> start = 1643238332.8472316, timeout = 90
> msg = "Missing: ['Starting listening for CQL clients'] not found in system.log:\n Head: INFO  [main] 2022-01-26 23:04:33,906...26 23:06:41,472 TokenMetadata.java:226 - Token -3193255413308472407 changing ownership from /127.0.0.3 to /127.0.0.4\n"
> node = 'replacement'
>     @staticmethod
>     def raise_if_passed(start, timeout, msg, node=None):
>         if start + timeout < time.time():
> >           raise TimeoutError.create(start, timeout, msg, node)
> E           ccmlib.node.TimeoutError: 26 Jan 2022 23:07:02 [replacement] after 90.12/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:
> E            Head: INFO  [main] 2022-01-26 23:04:33,906 YamlConfigura
> E            Tail: ...endingRangeCalculator:1] 2022-01-26 23:06:41,472 TokenMetadata.java:226 - Token -3193255413308472407 changing ownership from /127.0.0.3 to /127.0.0.4
> ../venv/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> This test can be run isolation via 'pytest --force-resource-intensive-tests --cassandra-dir=~/cassandra replace_address_test.py::TestReplaceAddress::test_restart_failed_replace'



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org