You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Capwell (Jira)" <ji...@apache.org> on 2021/11/02 18:35:00 UTC

[jira] [Commented] (CASSANDRA-17085) commit log was switched from non-daemon to daemon threads, which causes the JVM to exit in some case as no non-daemon threads are active

    [ https://issues.apache.org/jira/browse/CASSANDRA-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437534#comment-17437534 ] 

David Capwell commented on CASSANDRA-17085:
-------------------------------------------

[~samt] was working on another issue with commit log which is another race condition bug; will ninja that patch into this one as it touches the same code and rewriting the same interface twice is annoying...

Error is

{code}
ERROR [COMMIT-LOG-WRITER] 2021-10-25 14:51:13,985 Exiting due to error while processing commit log during initialization.
org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedByInterruptException
	at org.apache.cassandra.db.commitlog.CompressedSegment.write(CompressedSegment.java:86)
	at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:360)
	at org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.sync(AbstractCommitLogSegmentManager.java:555)
	at org.apache.cassandra.db.commitlog.CommitLog.sync(CommitLog.java:253)
	at org.apache.cassandra.db.commitlog.AbstractCommitLogService$SyncRunnable.run(AbstractCommitLogService.java:178)
	at org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:86)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException: null
	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
	at org.apache.cassandra.db.commitlog.CompressedSegment.write(CompressedSegment.java:78)
	... 7 common frames omitted
{code}

> commit log was switched from non-daemon to daemon threads, which causes the JVM to exit in some case as no non-daemon threads are active
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17085
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.x
>
>
> Right now bootstrap tests are failing every time we run, this work is to debug and fix the underling issue.
> Examples:
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7089
> {code}
> >       node3.nodetool('bootstrap resume')
> bootstrap_test.py:1014: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:1005: in nodetool
>     return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', '-p', str(self.jmx_port)] + shlex.split(cmd))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> process = <subprocess.Popen object at 0x7fb071a03940>
> cmd_args = ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', ...]
>     def handle_external_tool_process(process, cmd_args):
>         out, err = process.communicate()
>         if (out is not None) and isinstance(out, bytes):
>             out = out.decode()
>         if (err is not None) and isinstance(err, bytes):
>             err = err.decode()
>         rc = process.returncode
>     
>         if rc != 0:
> >           raise ToolError(cmd_args, rc, out, err)
> E           ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', 'resume'] exited with non-zero status; exit status: 1; 
> E           stderr: nodetool: Failed to connect to 'localhost:7300' - EOFException: 'null'.
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2305: ToolError
> {code}
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7087
> {code}
> >       node1.start()
> bootstrap_test.py:483: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:895: in start
>     node.watch_log_for_alive(self, from_mark=mark)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:664: in watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, filename=filename)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:592: in watch_log_for
>     head=reads[:50], tail="..."+reads[len(reads)-150:]))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> start = 1635453190.3118386, timeout = 120
> msg = "Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:\n Head: \n Tail: ..."
> node = 'node3'
>     @staticmethod
>     def raise_if_passed(start, timeout, msg, node=None):
>         if start + timeout < time.time():
> >           raise TimeoutError.create(start, timeout, msg, node)
> E           ccmlib.node.TimeoutError: 28 Oct 2021 20:35:10 [node3] after 120.12/120 seconds Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:
> E            Head: 
> E            Tail: ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org