You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jordan West (Jira)" <ji...@apache.org> on 2019/11/12 17:53:00 UTC

[jira] [Commented] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization

    [ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972662#comment-16972662 ] 

Jordan West commented on CASSANDRA-15295:
-----------------------------------------

Hi [~djoshi] , I like the approach to removing the potential of re-introducing the bug. However, it seems like {{CommitLog}} was changed only partially to be thread-safe. For example, 
 concurrent calls to {{#start()}} and {{#shutdownBlocking()}} could leave the {{CommitLog}} in an invalid state: If the thread executing {{#start()}} pauses before calling {{executor.start()}}, and resumes
 only after a separate thread executing {{#shutdownBlocking}} calls {{executor.shutdown()}} and {{executor.awaitTermination}} (which will immediately exit since {{thread == null}}), the {{segmentManager}} will be shutdown but the {{executor}} will still be running.

 

Is it necessary to make {{CommitLog}} thread-safe? Removing the singleton doesn’t really change the odds of it being used from multiple threads and the original version wasn’t thread-safe either w.r.t to these functions.

Some other comments/minor nits:
 * I like moving the factory code to a function to reduce the amount of new code
 * Is it necessary to change AbstractCommitLogSegmentManager#shutdown() to no longer use an assert? That seems like a semantic change that is stylistic but I may be missing a further motivation for this change.
 * Minor style nit: In CommitLog#start, {{if (started) return true;}} should be on two lines per the Cassandra style guides

> Running into deadlock when do CommitLog initialization
> ------------------------------------------------------
>
>                 Key: CASSANDRA-15295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>            Priority: Normal
>         Attachments: image.png, jstack.log, pstack.log, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time.
>  I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment*
>  !screenshot-1.png! 
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running.  
>  !screenshot-2.png! 
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization.
>   !screenshot-3.png! 
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method).  COMMIT-LOG-ALLOCATOR will block on this line because CommitLog class is still initializing.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org