You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jordan West (Jira)" <ji...@apache.org> on 2019/11/12 17:53:00 UTC
[jira] [Commented] (CASSANDRA-15295) Running into deadlock when do
CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972662#comment-16972662 ]
Jordan West commented on CASSANDRA-15295:
-----------------------------------------
Hi [~djoshi] , I like the approach to removing the potential of re-introducing the bug. However, it seems like {{CommitLog}} was changed only partially to be thread-safe. For example,
concurrent calls to {{#start()}} and {{#shutdownBlocking()}} could leave the {{CommitLog}} in an invalid state: If the thread executing {{#start()}} pauses before calling {{executor.start()}}, and resumes
only after a separate thread executing {{#shutdownBlocking}} calls {{executor.shutdown()}} and {{executor.awaitTermination}} (which will immediately exit since {{thread == null}}), the {{segmentManager}} will be shutdown but the {{executor}} will still be running.
Is it necessary to make {{CommitLog}} thread-safe? Removing the singleton doesn’t really change the odds of it being used from multiple threads and the original version wasn’t thread-safe either w.r.t to these functions.
Some other comments/minor nits:
* I like moving the factory code to a function to reduce the amount of new code
* Is it necessary to change AbstractCommitLogSegmentManager#shutdown() to no longer use an assert? That seems like a semantic change that is stylistic but I may be missing a further motivation for this change.
* Minor style nit: In CommitLog#start, {{if (started) return true;}} should be on two lines per the Cassandra style guides
> Running into deadlock when do CommitLog initialization
> ------------------------------------------------------
>
> Key: CASSANDRA-15295
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Commit Log
> Reporter: Zephyr Guo
> Assignee: Zephyr Guo
> Priority: Normal
> Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time.
> I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment*
> !screenshot-1.png!
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running.
> !screenshot-2.png!
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization.
> !screenshot-3.png!
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog class is still initializing.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org