You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Berenguer Blasi (Jira)" <ji...@apache.org> on 2021/04/30 07:48:00 UTC

[jira] [Comment Edited] (CASSANDRA-16644) Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup

    [ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337154#comment-17337154 ] 

Berenguer Blasi edited comment on CASSANDRA-16644 at 4/30/21, 7:47 AM:
-----------------------------------------------------------------------

In order to follow the chain of events you need the logs [here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-novnode/403/Cassandra-trunk-dtest-novnode/label=cassandra-dtest,split=64/]

The test fails on

bq. ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:

But inspecting stdout logs we can see the node is started around 23:59:21

{noformat}
23:59:21,502 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.0, install_dir=/home/cassandra/cassandra
00:00:19,524 cassandra.cluster INFO New Cassandra host <Host: 127.0.0.4:9042 datacenter1> discovered
{noformat}

in node4 system logs we can see the actual message is really present in the log and the node has started:

{noformat}
INFO  [main] 2021-04-27 00:00:18,493 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)...
{noformat}

We seem to have a failure where a message present at 00:00:18 is not being detected and the failure is triggered at 00:01:00 which makes no sense. I have followed the code looking for holes in the log {{mark}} logic but I didn't see any obvious problems. The test seems pretty solid looking at jenkins history, it is a test problem not a code problem, we have no previous tickets on it, there doesn't seem to be anything else to investigate,... so I am proposing closing this one and reopening if it raises again. Then I would add some further debug to have some thread we can start pulling.


was (Author: bereng):
In order to follow the chain of events you need the logs [here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-novnode/403/Cassandra-trunk-dtest-novnode/label=cassandra-dtest,split=64/]

The test fails on

bq. ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:

But inspecting stdout logs we can see the node is started around 23:59:21

{noformat}
23:59:21,502 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.0, install_dir=/home/cassandra/cassandra
00:00:19,524 cassandra.cluster INFO New Cassandra host <Host: 127.0.0.4:9042 datacenter1> discovered
{noformat}

in node4 system logs we can see the actual message is really present in the log and the node has started:

{noformat}
INFO  [main] 2021-04-27 00:00:18,493 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)...
{noformat}

We seem to have a failure where a message present at 00:00:18 is not being detected and the failure is triggered at 00:01:00 which makes no sense. I have followed the code looking for holes in the log {{mark}} logic but I didn't see any obvious problems. The test seems pretty solid looking at jenkins history, it is a test problem not a code problem, we have no previous tickets on it, there doesn't seem to be anything else to investigate,... so I am proposing closing this one and reopening if it raises again. Then I would add some further debug to have some thread we start pulling.

> Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-16644
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16644
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>
> Failure [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/]
> {noformat}
> Error Message
> ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log: Tail: INFO  [main] 2021-04-26 23:59:22,590 YamlConfigura
> Stacktrace
> self = <transient_replication_ring_test.TestTransientReplicationRing object at 0x7f66c51ebd90>
>     @flaky(max_runs=1)
>     @pytest.mark.no_vnodes
>     def test_move_backwards_and_cleanup(self):
>         """Test moving a node backwards without moving past a neighbor token"""
>         move_token = '00005'
>         expected_after_move = [gen_expected(range(0, 6), range(31, 40)),
>                                gen_expected(range(0, 21, 2)),
>                                gen_expected(range(1, 6, 2), range(6, 31)),
>                                gen_expected(range(7, 20, 2), range(21, 40))]
>         expected_after_repair = [gen_expected(range(0, 6), range(31, 40)),
>                                  gen_expected(range(0, 21)),
>                                  gen_expected(range(6, 31)),
>                                  gen_expected(range(21, 40))]
> >       self.move_test(move_token, expected_after_move, expected_after_repair)
> transient_replication_ring_test.py:333: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> transient_replication_ring_test.py:235: in move_test
>     node4.start(wait_for_binary_proto=True)
> transient_replication_ring_test.py:50: in new_start
>     return old_start(*args, **kwargs)
> ../venv/src/ccm/ccmlib/node.py:901: in start
>     self.wait_for_binary_interface(from_mark=self.mark)
> ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface
>     self.watch_log_for("Starting listening for CQL clients", **kwargs)
> ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for
>     TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> start = 1619481570.0236456, timeout = 90
> msg = "Missing: ['Starting listening for CQL clients'] not found in system.log:\nTail: INFO  [main] 2021-04-26 23:59:22,590 YamlConfigura"
> node = 'node4'
>     @staticmethod
>     def raise_if_passed(start, timeout, msg, node=None):
>         if start + timeout < time.time():
> >           raise TimeoutError.create(start, timeout, msg, node)
> E           ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:
> E           Tail: INFO  [main] 2021-04-26 23:59:22,590 YamlConfigura
> ../venv/src/ccm/ccmlib/node.py:56: TimeoutError
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org