You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2012/03/14 23:38:39 UTC
[jira] [Created] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Stream sessions can only fail via the FailureDetector
-----------------------------------------------------
Key: CASSANDRA-4051
URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.1.0
Reporter: Brandon Williams
Assignee: Brandon Williams
Fix For: 1.1.1
If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-4051:
----------------------------------------
Affects Version/s: (was: 1.1.0)
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 1.1.1
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-4051:
----------------------------------------
Reviewer: yukim (was: slebresne)
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-4051:
----------------------------------------
Reviewer: slebresne
Labels: streaming (was: )
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-4051:
----------------------------------------
Fix Version/s: (was: 1.1.1)
1.1.0
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 1.1.0
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (CASSANDRA-4051) Stream sessions can only fail
via the FailureDetector
Posted by "Brandon Williams (Reopened) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams reopened CASSANDRA-4051:
-----------------------------------------
Reopening because this only fixes the problem in one way, FileStreamTask can still fail all 8 times and never close the session. In general, outbound streaming's "fire and forget" methodology is problematic for bulk loading.
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051-v2.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4051) Stream sessions can only fail
via the FailureDetector
Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229855#comment-13229855 ]
Brandon Williams commented on CASSANDRA-4051:
---------------------------------------------
It looks like we could extract/rebase the streaming changes from CASSANDRA-3112's first patch to solve this well enough for the bulk loader and BOF.
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 1.1.1
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-4051:
----------------------------------------
Attachment: 4051.txt
Updated patch extracted as mentioned, doesn't change any streaming behavior but does provide a way to detect errors that CASSANDRA-3112 and CASSANDRA-4045 can build on.
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuki Morishita updated CASSANDRA-4051:
--------------------------------------
Attachment: 4051-v2.txt
Patch attached based on CASSANDRA-3817 with retry limit.
(I think it is nice to have retry limit per stream session, so that we can configure, say, no retry for bulk loading, which I think is enough. But that's beyond this issue.)
> Brandon
Can you test and see if BOF is OK with this patch?
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051-v2.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via
the FailureDetector
Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuki Morishita updated CASSANDRA-4051:
--------------------------------------
Attachment: 4051-v3.txt
v3 attached for 1.1 branch.
It basically catches IOException on both sides and lets sessions closed.
I also implemented IStreamCallback#onFailure to make sure latches count down to avoid process hang.
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051-v2.txt, 4051-v3.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4051) Stream sessions can only fail
via the FailureDetector
Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239614#comment-13239614 ]
Yuki Morishita commented on CASSANDRA-4051:
-------------------------------------------
Since CASSANDRA-3216 added IEndpointStateChangeSubscriber and IFailureDetectionEventListner to StreamOutSession, we need to keep that functionality. I proposed modified version of CASSANDRA-3112 except limiting retry part on CASSANDRA-3817, I would like to rebase that patch and add retry, so that I can post it here. (I will post it soon.)
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
> Key: CASSANDRA-4051
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Labels: streaming
> Fix For: 1.1.0
>
> Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed. This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira