You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2012/03/14 23:38:39 UTC

[jira] [Created] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Stream sessions can only fail via the FailureDetector
-----------------------------------------------------

                 Key: CASSANDRA-4051
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.1.0
            Reporter: Brandon Williams
            Assignee: Brandon Williams
             Fix For: 1.1.1


If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4051:
----------------------------------------

    Affects Version/s:     (was: 1.1.0)
    
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.1
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4051:
----------------------------------------

    Reviewer: yukim  (was: slebresne)
    
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4051:
----------------------------------------

    Reviewer: slebresne
      Labels: streaming  (was: )
    
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4051:
----------------------------------------

    Fix Version/s:     (was: 1.1.1)
                   1.1.0
    
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.0
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reopened CASSANDRA-4051:
-----------------------------------------


Reopening because this only fixes the problem in one way, FileStreamTask can still fail all 8 times and never close the session.  In general, outbound streaming's "fire and forget" methodology is problematic for bulk loading.
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051-v2.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229855#comment-13229855 ] 

Brandon Williams commented on CASSANDRA-4051:
---------------------------------------------

It looks like we could extract/rebase the streaming changes from CASSANDRA-3112's first patch to solve this well enough for the bulk loader and BOF.
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.1.1
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4051:
----------------------------------------

    Attachment: 4051.txt

Updated patch extracted as mentioned, doesn't change any streaming behavior but does provide a way to detect errors that CASSANDRA-3112 and CASSANDRA-4045 can build on.
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4051:
--------------------------------------

    Attachment: 4051-v2.txt

Patch attached based on CASSANDRA-3817 with retry limit.
(I think it is nice to have retry limit per stream session, so that we can configure, say, no retry for bulk loading, which I think is enough. But that's beyond this issue.)

> Brandon

Can you test and see if BOF is OK with this patch?
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051-v2.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4051:
--------------------------------------

    Attachment: 4051-v3.txt

v3 attached for 1.1 branch.

It basically catches IOException on both sides and lets sessions closed.
I also implemented IStreamCallback#onFailure to make sure latches count down to avoid process hang.
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051-v2.txt, 4051-v3.txt, 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4051) Stream sessions can only fail via the FailureDetector

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239614#comment-13239614 ] 

Yuki Morishita commented on CASSANDRA-4051:
-------------------------------------------

Since CASSANDRA-3216 added IEndpointStateChangeSubscriber and IFailureDetectionEventListner to StreamOutSession, we need to keep that functionality. I proposed modified version of CASSANDRA-3112 except limiting retry part on CASSANDRA-3817, I would like to rebase that patch and add retry, so that I can post it here. (I will post it soon.)
                
> Stream sessions can only fail via the FailureDetector
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4051
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4051
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>              Labels: streaming
>             Fix For: 1.1.0
>
>         Attachments: 4051.txt
>
>
> If for some reason, FileStreamTask itself fails more than the number of retry attempts but gossip continues to work, the stream session will never be closed.  This is unlikely to happen in practice since it requires blocking the storage port from new connections but keeping the existing ones, however for the bulk loader this is especially problematic since it doesn't have access to a failure detector and thus no way of knowing if a session failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira