You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/06/07 05:41:55 UTC

[jira] Created: (CASSANDRA-1169) AES makes Streaming unhappy

AES makes Streaming unhappy
---------------------------

                 Key: CASSANDRA-1169
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Jonathan Ellis
            Assignee: Stu Hood
            Priority: Critical
             Fix For: 0.6.3


Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.

For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "albert_e (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878232#action_12878232 ] 

albert_e commented on CASSANDRA-1169:
-------------------------------------

StreamOutManager.waitForStreamCompletion() can't block the thread if StreamOutManager has not been removed from streamManagers map. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek updated CASSANDRA-1169:
-------------------------------------

    Attachment: 1169-2.txt

Instruct AES to remove active StreamManager when it's finished and reset the condition in SOM any time files are added so that it is a bit more reentrant.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876308#action_12876308 ] 

Chris Goffinet commented on CASSANDRA-1169:
-------------------------------------------

As mentioned on IRC, we actually saw T sending to S, while anticompaction was running on S using the exact filename as T. We should probably break streaming out to support `stream/<source>/<sstables>`. 

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878670#action_12878670 ] 

Jonathan Ellis commented on CASSANDRA-1169:
-------------------------------------------

won't removing the active SOM bork things, if another stream to that target is going on?

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878665#action_12878665 ] 

Gary Dusbabek commented on CASSANDRA-1169:
------------------------------------------

albert_e: patch 2 adjusts addFilesToStream to reset the condition so that future waiters do wait.
Lu Ming: I believe you were experiencing that problem.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Lu Ming (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878234#action_12878234 ] 

Lu Ming commented on CASSANDRA-1169:
------------------------------------

I have applied your patch to cassandra.
According to my log on StreamOutManager.addFilesToStream() , the function is still called when StreamOutManager.files.size() > 0 
I think the problem is maybe not fixed yet.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878684#action_12878684 ] 

Jonathan Ellis commented on CASSANDRA-1169:
-------------------------------------------

+1 on this patch and do-what-we-can-in-0.6-and-eviscerate-this-crap-in-0.7 in general

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated CASSANDRA-1169:
---------------------------------------

    Attachment: aes.txt

I have upgraded to 6.2 because 6.1 streaming would randomly timeout on me. Now, I am still having issues with move, join, repair. Since I was having so many streaming problems I tuned this up in some logs. Over the past few weeks I have spent a lot of time managing my clusters, I try to do these type of operations in the AM so they are less performance impacting, but I have a very low sucess rate with any move,join,repair. I have a building list of nodes to join and ring management that I keep having to put off due to failures. So anything to make these processes less brittle would be a big big deal. Attached is ooutput.
 

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>         Attachments: aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "albert_e (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878232#action_12878232 ] 

albert_e edited comment on CASSANDRA-1169 at 6/12/10 4:25 AM:
--------------------------------------------------------------

StreamOutManager.waitForStreamCompletion() can't block the AES streaming thread if StreamOutManager has not been removed from streamManagers map. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.

      was (Author: albert_e):
    StreamOutManager.waitForStreamCompletion() can't block the thread if StreamOutManager has not been removed from streamManagers map. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.
  
> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Lu Ming (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878249#action_12878249 ] 

Lu Ming commented on CASSANDRA-1169:
------------------------------------

After I applied the above patch,
StreamOutManager.waitForStreamCompletion()  return immediately and StreamOut.transferSSTables do Not  wait for its streaming tasks to finish

27938- INFO [STREAM-STAGE:1] 2010-06-12 17:08:48,810 StreamOut.java (line 132) Sending a stream initiate message to /121.1.1.1...
27939: INFO [STREAM-STAGE:1] 2010-06-12 17:08:48,810 StreamOut.java (line 137) Waiting for transfer to /121.1.1.1 to complete
27940- INFO [STREAM-STAGE:1] 2010-06-12 17:08:48,810 StreamOut.java (line 141) Done with transfer to /121.1.1.1
27941- INFO [AE-SERVICE-STAGE:1] 2010-06-12 17:08:48,811 AntiEntropyService.java (line 641) Finished streaming repair to /121.1.1.1 for (GroupDataStore,Group)
..................................
27982- INFO [STREAM-STAGE:1] 2010-06-12 17:19:22,066 StreamOut.java (line 132) Sending a stream initiate message to /222.222.2.2 ...
27983: INFO [STREAM-STAGE:1] 2010-06-12 17:19:22,066 StreamOut.java (line 137) Waiting for transfer to /222.222.2.2 to complete
27984- INFO [STREAM-STAGE:1] 2010-06-12 17:19:22,066 StreamOut.java (line 141) Done with transfer to /222.222.2.2
27985- INFO [AE-SERVICE-STAGE:1] 2010-06-12 17:19:22,067 AntiEntropyService.java (line 641) Finished streaming repair to /222.222.2.2 for (GroupChat,Topic)
..................................

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek updated CASSANDRA-1169:
-------------------------------------

    Attachment: 1169.txt

Ensures that AES streaming happens on the streaming stage and waits for each transfer to complete.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876217#action_12876217 ] 

Gary Dusbabek commented on CASSANDRA-1169:
------------------------------------------

Is this the root cause of the problems being experienced at Digg and by Lu Ming on the ML?

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "albert_e (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878232#action_12878232 ] 

albert_e edited comment on CASSANDRA-1169 at 6/12/10 4:37 AM:
--------------------------------------------------------------

StreamOutManager.waitForStreamCompletion() can't block the AES streaming thread if StreamOutManager.condition is signaled once and StreamOutManager has not been removed from streamManagers map. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.

      was (Author: albert_e):
    StreamOutManager.waitForStreamCompletion() can't block the AES streaming thread if StreamOutManager has not been removed from streamManagers map and StreamOutManager.condition is signaled once. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.
  
> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1169:
--------------------------------------

    Assignee: Gary Dusbabek  (was: Stu Hood)

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876249#action_12876249 ] 

Jonathan Ellis commented on CASSANDRA-1169:
-------------------------------------------

I think so.  That's what prompted this ticket.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek reopened CASSANDRA-1169:
--------------------------------------


> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878682#action_12878682 ] 

Gary Dusbabek commented on CASSANDRA-1169:
------------------------------------------

Stu: Right.  But I don't think we want to introduce it in 0.6.  I'm hoping just to get things to the point of working and then fix it all in 0.7. 

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877085#action_12877085 ] 

Edward Capriolo commented on CASSANDRA-1169:
--------------------------------------------

I have this problem as well. I have a 5 node cluster. A simple repair on keyspace1 (which has nothing but some test data) streams never complete.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877840#action_12877840 ] 

Jonathan Ellis commented on CASSANDRA-1169:
-------------------------------------------

you don't need to do that exception dance with futures, it will throw an exception that happened in the background as a wrapped ExecutionException on get (and all our executors are DebuggableTPE, which makes sure the exception gets logged even if get() is never called)

LGTM otherwise

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878675#action_12878675 ] 

Gary Dusbabek commented on CASSANDRA-1169:
------------------------------------------

SOM.remove() checks for that.  (I'm not saying the code is perfect--it's not--but I don't think removing the SOM is going to mess things up.)

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "albert_e (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878232#action_12878232 ] 

albert_e edited comment on CASSANDRA-1169 at 6/12/10 4:35 AM:
--------------------------------------------------------------

StreamOutManager.waitForStreamCompletion() can't block the AES streaming thread if StreamOutManager has not been removed from streamManagers map and StreamOutManager.condition is signaled once. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.

      was (Author: albert_e):
    StreamOutManager.waitForStreamCompletion() can't block the AES streaming thread if StreamOutManager has not been removed from streamManagers map. 

Make StreamOutManager.addFilesToStream() synchronized and block the thread if StreamOutManager.files.size() > 0 may be more efficient.
  
> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1169) AES makes Streaming unhappy

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878677#action_12878677 ] 

Stu Hood commented on CASSANDRA-1169:
-------------------------------------

Rather than an 'endpoint->StreamManager' map, we really should have a 'session_id->StreamManager' map.

> AES makes Streaming unhappy
> ---------------------------
>
>                 Key: CASSANDRA-1169
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1169
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 1169-2.txt, 1169.txt, aes.txt
>
>
> Streaming service assumes there will only be one stream from S to T at a time for any nodes S and T.  For the original purpose of node movement, this was a reasonable assumption (any node T can only perform one move at a time) but AES throws off streaming tasks much more frequently than that given the right conditions, which will de-sync the fragile file ordering that Streaming assumes (that T knows which files S is going to send, in what order).  Eventually T is expecting file F1 but S sends a smaller file F2, leading to an infinite loop on T while it waits for F1 to finish, and T waits for S to acknowledge F2, which it never will.
> For 0.6 maybe the best solution is for AES to manually wait for one of its streaming tasks to finish, before it allows itself to create another.  For 0.7 it would be nice to make Streaming more robust.  The whole 4-stage-ack process seems very fragile, and poking around in parent objects via inetaddress keys makes reasoning about small pieces impossible b/c of encapsulation violations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.