You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Capwell (Jira)" <ji...@apache.org> on 2021/11/01 19:32:00 UTC

[jira] [Commented] (CASSANDRA-17085) Fix python dtests bootstrap_test.py::TestBootstrap

    [ https://issues.apache.org/jira/browse/CASSANDRA-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436995#comment-17436995 ] 

David Capwell commented on CASSANDRA-17085:
-------------------------------------------

Here is what I am seeing in bootstrap_test.py::TestBootstrap::test_bootstrap_with_reset_bootstrap_state

{code}
      node3 = new_node(cluster)
        try:
            node3.start()
        except NodeError:
            pass  # node doesn't start as expected
        t.join()
        node1.start()
{code}

node1.start checks all the alive nodes (according to ccm) to see if node1 is seen as up in the logs.  node3 is dead (or dying), so it should not be included in the watch set

> Fix python dtests bootstrap_test.py::TestBootstrap
> --------------------------------------------------
>
>                 Key: CASSANDRA-17085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17085
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: David Capwell
>            Priority: Normal
>             Fix For: 4.x
>
>
> Right now bootstrap tests are failing every time we run, this work is to debug and fix the underling issue.
> Examples:
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7089
> {code}
> >       node3.nodetool('bootstrap resume')
> bootstrap_test.py:1014: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:1005: in nodetool
>     return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', '-p', str(self.jmx_port)] + shlex.split(cmd))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> process = <subprocess.Popen object at 0x7fb071a03940>
> cmd_args = ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', ...]
>     def handle_external_tool_process(process, cmd_args):
>         out, err = process.communicate()
>         if (out is not None) and isinstance(out, bytes):
>             out = out.decode()
>         if (err is not None) and isinstance(err, bytes):
>             err = err.decode()
>         rc = process.returncode
>     
>         if rc != 0:
> >           raise ToolError(cmd_args, rc, out, err)
> E           ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', 'resume'] exited with non-zero status; exit status: 1; 
> E           stderr: nodetool: Failed to connect to 'localhost:7300' - EOFException: 'null'.
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2305: ToolError
> {code}
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7087
> {code}
> >       node1.start()
> bootstrap_test.py:483: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:895: in start
>     node.watch_log_for_alive(self, from_mark=mark)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:664: in watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, filename=filename)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:592: in watch_log_for
>     head=reads[:50], tail="..."+reads[len(reads)-150:]))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> start = 1635453190.3118386, timeout = 120
> msg = "Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:\n Head: \n Tail: ..."
> node = 'node3'
>     @staticmethod
>     def raise_if_passed(start, timeout, msg, node=None):
>         if start + timeout < time.time():
> >           raise TimeoutError.create(start, timeout, msg, node)
> E           ccmlib.node.TimeoutError: 28 Oct 2021 20:35:10 [node3] after 120.12/120 seconds Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:
> E            Head: 
> E            Tail: ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org