You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Berenguer Blasi (Jira)" <ji...@apache.org> on 2021/04/30 07:48:00 UTC
[jira] [Comment Edited] (CASSANDRA-16644) Flaky
TestTransientReplicationRing.test_move_backwards_and_cleanup
[ https://issues.apache.org/jira/browse/CASSANDRA-16644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337154#comment-17337154 ]
Berenguer Blasi edited comment on CASSANDRA-16644 at 4/30/21, 7:47 AM:
-----------------------------------------------------------------------
In order to follow the chain of events you need the logs [here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-novnode/403/Cassandra-trunk-dtest-novnode/label=cassandra-dtest,split=64/]
The test fails on
bq. ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:
But inspecting stdout logs we can see the node is started around 23:59:21
{noformat}
23:59:21,502 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.0, install_dir=/home/cassandra/cassandra
00:00:19,524 cassandra.cluster INFO New Cassandra host <Host: 127.0.0.4:9042 datacenter1> discovered
{noformat}
in node4 system logs we can see the actual message is really present in the log and the node has started:
{noformat}
INFO [main] 2021-04-27 00:00:18,493 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)...
{noformat}
We seem to have a failure where a message present at 00:00:18 is not being detected and the failure is triggered at 00:01:00 which makes no sense. I have followed the code looking for holes in the log {{mark}} logic but I didn't see any obvious problems. The test seems pretty solid looking at jenkins history, it is a test problem not a code problem, we have no previous tickets on it, there doesn't seem to be anything else to investigate,... so I am proposing closing this one and reopening if it raises again. Then I would add some further debug to have some thread we can start pulling.
was (Author: bereng):
In order to follow the chain of events you need the logs [here|https://nightlies.apache.org/cassandra/trunk/Cassandra-trunk-dtest-novnode/403/Cassandra-trunk-dtest-novnode/label=cassandra-dtest,split=64/]
The test fails on
bq. ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:
But inspecting stdout logs we can see the node is started around 23:59:21
{noformat}
23:59:21,502 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.0, install_dir=/home/cassandra/cassandra
00:00:19,524 cassandra.cluster INFO New Cassandra host <Host: 127.0.0.4:9042 datacenter1> discovered
{noformat}
in node4 system logs we can see the actual message is really present in the log and the node has started:
{noformat}
INFO [main] 2021-04-27 00:00:18,493 PipelineConfigurator.java:125 - Starting listening for CQL clients on /127.0.0.4:9042 (unencrypted)...
{noformat}
We seem to have a failure where a message present at 00:00:18 is not being detected and the failure is triggered at 00:01:00 which makes no sense. I have followed the code looking for holes in the log {{mark}} logic but I didn't see any obvious problems. The test seems pretty solid looking at jenkins history, it is a test problem not a code problem, we have no previous tickets on it, there doesn't seem to be anything else to investigate,... so I am proposing closing this one and reopening if it raises again. Then I would add some further debug to have some thread we start pulling.
> Flaky TestTransientReplicationRing.test_move_backwards_and_cleanup
> ------------------------------------------------------------------
>
> Key: CASSANDRA-16644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16644
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: Berenguer Blasi
> Assignee: Berenguer Blasi
> Priority: Normal
>
> Failure [here|https://ci-cassandra.apache.org/job/Cassandra-trunk/472/testReport/junit/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_move_backwards_and_cleanup/]
> {noformat}
> Error Message
> ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log: Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura
> Stacktrace
> self = <transient_replication_ring_test.TestTransientReplicationRing object at 0x7f66c51ebd90>
> @flaky(max_runs=1)
> @pytest.mark.no_vnodes
> def test_move_backwards_and_cleanup(self):
> """Test moving a node backwards without moving past a neighbor token"""
> move_token = '00005'
> expected_after_move = [gen_expected(range(0, 6), range(31, 40)),
> gen_expected(range(0, 21, 2)),
> gen_expected(range(1, 6, 2), range(6, 31)),
> gen_expected(range(7, 20, 2), range(21, 40))]
> expected_after_repair = [gen_expected(range(0, 6), range(31, 40)),
> gen_expected(range(0, 21)),
> gen_expected(range(6, 31)),
> gen_expected(range(21, 40))]
> > self.move_test(move_token, expected_after_move, expected_after_repair)
> transient_replication_ring_test.py:333:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> transient_replication_ring_test.py:235: in move_test
> node4.start(wait_for_binary_proto=True)
> transient_replication_ring_test.py:50: in new_start
> return old_start(*args, **kwargs)
> ../venv/src/ccm/ccmlib/node.py:901: in start
> self.wait_for_binary_interface(from_mark=self.mark)
> ../venv/src/ccm/ccmlib/node.py:687: in wait_for_binary_interface
> self.watch_log_for("Starting listening for CQL clients", **kwargs)
> ../venv/src/ccm/ccmlib/node.py:588: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> start = 1619481570.0236456, timeout = 90
> msg = "Missing: ['Starting listening for CQL clients'] not found in system.log:\nTail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura"
> node = 'node4'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> > raise TimeoutError.create(start, timeout, msg, node)
> E ccmlib.node.TimeoutError: 27 Apr 2021 00:01:00 [node4] after 90.11/90 seconds Missing: ['Starting listening for CQL clients'] not found in system.log:
> E Tail: INFO [main] 2021-04-26 23:59:22,590 YamlConfigura
> ../venv/src/ccm/ccmlib/node.py:56: TimeoutError
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org