You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Zephyr Guo (Jira)" <ji...@apache.org> on 2019/10/10 07:19:00 UTC

[jira] [Comment Edited] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization

    [ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948268#comment-16948268 ] 

Zephyr Guo edited comment on CASSANDRA-15295 at 10/10/19 7:18 AM:
------------------------------------------------------------------

Hi, [~djoshi] thanks for the review. I agree with most of what you said. Your change is very minimal for this issue. But something else I have to remind you.

1. Most of the changes in my patch are to build a UT to check the exception case for CommitLog. It's worth it for Cassandra. Not only we have to fix this problem but we also need to understand the root cause of the problem (lack of exception tests).

2. Your change introduces the risk of starting twice. CommitLog was designed to a singleton and It manages lifecycle by itself. When other modules call CommitLog.instance, they expect an initialized CommitLog. You change the original initialization process.

3. The major change (move to a different class) is very simple. The change DOES NOT change any original initialization process.

4. I agree with that "I think it is important to get the correctness issue resolved first".  Don't you think that moving the code to another class is the easiest?


I respect your decision, incorporate my patch to get a better one. What's next?


was (Author: gzh1992n):
Hi, [~djoshi] thanks for the review. I agree with most of what you said. Your change is very minimal for this issue. But something else I have to remind you.

1. Most of the changes in my patch are going to build a UT that ensures the exception case for CommitLog. It's worth it for Cassandra. Not only we have to fix this problem but we also need to understand the root cause of the problem (lack of exception tests).

2. Your change introduces the risk of starting twice. CommitLog was designed to a singleton and It manages lifecycle by itself. When other modules call CommitLog.instance, they expect an initialized CommitLog. You change the original initialization process.

3. The major change (move to a different class) is very simple. The change DOES NOT change any original initialization process.

4. I agree with that "I think it is important to get the correctness issue resolved first".  Don't you think that moving the code to another class is the easiest?


I respect your decision, incorporate my patch to get a better one. What's next?

> Running into deadlock when do CommitLog initialization
> ------------------------------------------------------
>
>                 Key: CASSANDRA-15295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>            Priority: Normal
>         Attachments: jstack.log, pstack.log, screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time.
>  I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment*
>  !screenshot-1.png! 
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running.  
>  !screenshot-2.png! 
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization.
>   !screenshot-3.png! 
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method).  COMMIT-LOG-ALLOCATOR will block on this line because CommitLog class is still initializing.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org