You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2015/05/27 04:21:17 UTC
[jira] [Commented] (CASSANDRA-9458) Race condition causing
StreamSession to get stuck in WAIT_COMPLETE
[ https://issues.apache.org/jira/browse/CASSANDRA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560301#comment-14560301 ]
Yuki Morishita commented on CASSANDRA-9458:
-------------------------------------------
Thanks for providing the log and the patch.
I don't think the problem here is the race though. StreamSession's {{state}} is guarded by synchronized methods.
>From the log, I think both ends is in {{WAIT_COMPLETE}} state, and {{/11.22.33.44}} is waiting for {{/11.22.33.55}} to send {{CompleteMessage}} after it completes finalizing received files(in {{StreamReceiveTask.OnCompletionRunnable}}).
Do you have secondary indexes? Right now, streaming is considered completed after secondary indexes are built in that finalize phase(CASSANDRA-9308).
> Race condition causing StreamSession to get stuck in WAIT_COMPLETE
> ------------------------------------------------------------------
>
> Key: CASSANDRA-9458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9458
> Project: Cassandra
> Issue Type: Bug
> Reporter: Omid Aladini
> Assignee: Omid Aladini
> Priority: Critical
> Fix For: 2.1.x, 2.0.x
>
> Attachments: 9458-v1.txt
>
>
> I think there is a race condition in StreamSession where one side of the stream could get stuck in WAIT_COMPLETE although both have sent COMPLETE messages. Consider a scenario that node B is being bootstrapped and it only receives files during the session:
> 1- During a stream session A sends some files to B and B sends no files to A.
> 2- Once B completes the last task (receiving), StreamSession::maybeComplete is invoked.
> 3- While B is sending the COMPLETE message via StreamSession::maybeComplete, it also receives the COMPLETE message from A and therefore StreamSession::complete() is invoked.
> 4- Therefore both maybeComplete() and complete() functions have branched into the state != State.WAIT_COMPLETE case and both set the state to WAIT_COMPLETE.
> 5- Now B is waiting to receive COMPLETE although it's already received it and nothing triggers checking the state again, until it times out after streaming_socket_timeout_in_ms.
> In the log below:
> https://gist.github.com/omidaladini/003de259958ad8dfb07e
> although the node has received COMPLETE, "SocketTimeoutException" is thrown after streaming_socket_timeout_in_ms (30 minutes here).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)