You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Gary Dusbabek (JIRA)" <ji...@apache.org> on 2010/06/14 20:49:14 UTC

[jira] Created: (CASSANDRA-1189) Refactor streaming

Refactor streaming
------------------

                 Key: CASSANDRA-1189
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.7
            Reporter: Gary Dusbabek
            Assignee: Gary Dusbabek
            Priority: Critical
             Fix For: 0.7


The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.

The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1189:
--------------------------------

    Comment: was deleted

(was: AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started.

EDIT: After 1190, what this requirement will boil down to is "make StreamIn.requestRanges return a Future".
EDIT2: If it isn't out of scope, making StreamIn.requestRanges take an optional column family argument would help as well.)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890794#action_12890794 ] 

Gary Dusbabek commented on CASSANDRA-1189:
------------------------------------------

This looks good.

I think both approaches 1 and 2 can be combined if you're willing to throw in pending file lists as part of the stream header when the source replies to the initial stream request.  When the destination gets that data (along with the first stream), it can then request specific files that it now knows about.

If we can trust TCP I don't think we need checksums.

It sounds like we need three basic kinds of messages: request, response and status?

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment:     (was: 0001-Refactored-streaming-to-make-it-more-streamlined.patch)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0002-Test-cases-for-Streaming-Messages.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment:     (was: 0002-Test-cases-for-Streaming-Messages.patch)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1189) Refactor streaming

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878736#action_12878736 ] 

Stu Hood edited comment on CASSANDRA-1189 at 6/18/10 12:22 PM:
---------------------------------------------------------------

AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started.

EDIT: After 1190, what this requirement will boil down to is "make StreamIn.requestRanges return a Future".

      was (Author: stuhood):
    If streaming sessions/files were given unique ids, it would be helpful for repairs as well.

AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started. If streaming sessions had unique ids, the destination node could block for notification that a particular stream id had finished before indicating that the repair was finished.
  
> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment: 0002-Test-cases-for-Streaming-Messages.patch
                0003-1189-Fixes-v1.patch

Updated most of the suggested changes, I wasn't able to simplify StreamHeader, StreamContext and PendingFile. 

I've run various test for Request & Transfer and seems good, however there was one case when I got an error flushing the SSTable, looks like compaction happened before the file transfer could be completed, on a request. I'm trying to reproduce that, in the meantime here's the patch with updates. 

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch, 0003-1189-Fixes-v1.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment: 0001-Refactored-streaming-to-make-it-more-streamlined.patch

There's a lot of changes in this one. I'll try to explain some of the changes here.

- Anything to do with StreamInitiate is removed and some of what it does is incorporated in StreamIn and StreamOut. 
- StreamContext is added which is InetAddress+sessionId, which acts as the key for all transactions. 
- StreamHeader is added, which contains info on each stream as opposed to doing all the initiate stuff.

Transfers:
- Source maintains the list of files to stream in StreamOutManager. There's one per context.
- Destination doesn't maintain anything except for when the stream is active via StreamInManager.activeStreams.
- Destination sends a FileStatus message for the received stream

Requests:
- Destination initiates a request context in StreamInManager.
- Source compiles the list of files and adds it to a StreamOutManager, this is just for book-keeping purposes and troubleshooting.
- Source streams first file with header containing info of all the other files.
- Destination maintains the list of files in StreamInManager and takes control from this point.
- Destination requests the next file until done and re-requests on error

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894179#action_12894179 ] 

Gary Dusbabek commented on CASSANDRA-1189:
------------------------------------------

I'm still testing and looking at things, but in the interest of getting you some feedback...
* FileStreamTask-no point in keeping a reference to file since it could be had from header.
* StreamHeader-the protected fields should be final. Consider renaming "transfer" to something that better indicates the stream was pushed as opposed to requested. "containsPendingFiles" could be made redundant by converting to a method that checks for null-ness of "pending".
* IncomingStreamReader makes a !=null check in the constructor that is redundant.
* MessageSerDeser is only used by its test.  It can the test can be removed.
* StreamInManager-instance() needs a better name ("get" or "put"). remove() could be private.
* StreamingRequestTest is not valid. It passes whether or not the streaming has taken place (because it adds the sstables).
* StreamOut-consider renaming open() to something that indicates a flush is happening. transferSSTablesForRequest() could be private. the first debug statement doesn't include a {} for the second parameter.
* StreamRequestMetadata-protected variables should be final. singleFile could be made redundant by replacing with a method that checks for null-ness of file.
* Why doesn't SO.transferSSTablesForRequest enqueue multiple files and have the stream task remove from pending. I think we still have the ordering problem since SOM.removePending still pops the last file.
* calls to SIM.requestFile can be inlined to SI.requestFile

Having stream metadata spread across four classes (StreamRequestMetadata, PendingFile, StreamHeader and StreamContext) makes me think that we have more room for simplification.  What do you think?

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886839#action_12886839 ] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

I'm just reiterating the current code to help in refactoring decisions. 
- Stream_Request is invoked from StorageService during init and decommission on nodes and in AntiEntropyService to perform streaming repair. 
- Destination invokes STREAM_REQUEST (Compiles a set of ranges that it needs from the source)
- Source responds with STREAM_INITIATE (Prepares the request ranges and sends a list of pending files)
- Destination acknowledge with STREAM_INITIATE_DONE (Adds to list of pending files per node)
- Source  starts streaming the first file from the list of files it has prepared for that Destination node.
- Destination receives the file returns a Stream_Status.
- Source based on status, restreams file or streams the next file until complete.

Limitations:
- Source can transfer multiple streams, but only one per destination node.
- Destination can receive multiple streams, but again only one per source node.
- The ack process for pending files breaks for multiple streams or overlapping streams.
- List of pending files streamed needs to be in order, the current scheme maintains that order
- No session maintained on a per stream basis, session is loosely based on node.
- Each streamed file has no header to describe the file, that metadata was transferred during STREAM_INITIATE and the destination goes off that.

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1189) Refactor streaming

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek reassigned CASSANDRA-1189:
----------------------------------------

    Assignee: Nirmal Ranganathan  (was: Gary Dusbabek)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890813#action_12890813 ] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

Yes, combining 1 & 2 will work out, reducing that extra message.

We'll have the following:
- Stream (The response part, it doesn't use a verb currently and will not going forward too)
- StreamRequest (Reuse the Stream_Request verb)
- StreamStatus (Reuse Stream_Finished verb)


> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1189) Refactor streaming

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek reassigned CASSANDRA-1189:
----------------------------------------

    Assignee: Gary Dusbabek  (was: Nirmal Ranganathan)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1189) Refactor streaming

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878736#action_12878736 ] 

Stu Hood edited comment on CASSANDRA-1189 at 6/23/10 4:36 PM:
--------------------------------------------------------------

AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started.

EDIT: After 1190, what this requirement will boil down to is "make StreamIn.requestRanges return a Future".
EDIT2: If it isn't out of scope, making StreamIn.requestRanges take an optional column family argument would help as well.

      was (Author: stuhood):
    AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started.

EDIT: After 1190, what this requirement will boil down to is "make StreamIn.requestRanges return a Future".
  
> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment: 0001-Refactored-streaming-to-make-it-more-streamlined.patch
                0002-Test-cases-for-Streaming-Messages.patch

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment:     (was: 0001-Refactored-streaming-to-make-it-more-streamlined.patch)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment:     (was: 0002-Test-cases-for-Streaming-Messages.patch)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment: 0002-Test-cases-for-Streaming-Messages.patch
                0001-Refactored-streaming-to-make-it-more-streamlined.patch

Updated with some bugfixes and more complete unit tests

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878736#action_12878736 ] 

Stu Hood commented on CASSANDRA-1189:
-------------------------------------

If streaming sessions/files were given unique ids, it would be helpful for repairs as well.

AntiEntropyService.RepairSession in trunk and 0.6 only has a local view, so it can't know when other nodes have finished streaming data for repairs: it only blocks until streaming has started. If streaming sessions had unique ids, the destination node could block for notification that a particular stream id had finished before indicating that the repair was finished.

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment: 0003-1189-Fixes-v1.patch

Fixed the issue with the streaming request. Both the request/transfers are working.

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch, 0003-1189-Fixes-v1.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890762#action_12890762 ] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

Here's some proposed changes, please comment with feedback. There are two occurrences of streaming: 

Source transfers to Destination (Anti-entropy repair, node decommission, possibly bulk import)
- In each of the cases source has a list of sstable files it needs to transfer to the destination.
- Source maintains a list of all the files, source creates a session id for transferring this set of files.
- Source streams the first file, header contains a new StreamHeader, that has the PendingFile info embedded. 
- Destination receives the stream, it has all the info for the file, once done responds with a StreamStatus message.
- If StreamStatus is success, Source continues with next file, if not retransfer until all files are complete.

(Approach 1) Destination requests from Source (Anti-entropy repair, bootstrap, possibly bulk export)
- Destination complies list of ranges and sends a StreamRequest message to Source, it attaches a session id to keep track of the request.
- Source based on the ranges compiles a list of PendingFile's and sends a StreamRequestResponse message with the list of files.
- Destination now has the list of files to maintain state.
- Destination sends a StreamRequest for a file from the list, it has a session id and file descriptor info attached. 
- Source Streams the file to Destination. 
- Destination based on the transfer status, requests the next file or re-requests the same file, until all files are transferred. 

(Approach 2) Destination requests from Source (Anti-entropy repair, bootstrap, possibly bulk export)
- Destination complies list of ranges and sends a StreamRequest message to Source, it attaches a session id to keep track of the request.
- Source compiles list of PendingFile's from requested ranges. Source maintains state. 
- Source Streams file 1 with attached StreamHeader.
- Destination receives file and responds with a StreamStatus. 
- Source based on status transfers the next file or re-transfers the same file. 

Changes to Protocol for File Streaming:
- Current -> | Protocol magic | Header | Body (File contents) |
- Proposed -> | Protocol magic | Header | StreamHeader size | StreamHeader | Body (File contents) |
- The protocol for all other Message's remain the same, the format remains the same, the content will vary.

Effects of the mentioned changes:
- There can be multiple transfers per source and destination.
- No order of files is required, prevents overlapping streams from breaking anything.
- Other services can transfer files without a problem. 
- Initiate and Initiate Done will be removed. A little cleaner process. 
- Facilitates for adding a layer on top to do bulk imports/exports.

Questions:
- The current streaming does not seem to maintain persistant state if a node fails during streaming, would that be something that needs to be considered. 
- Do we want to add checksums?

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan reassigned CASSANDRA-1189:
---------------------------------------------

    Assignee: Nirmal Ranganathan  (was: Gary Dusbabek)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nirmal Ranganathan updated CASSANDRA-1189:
------------------------------------------

    Attachment:     (was: 0003-1189-Fixes-v1.patch)

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch, 0003-1189-Fixes-v1.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Nirmal Ranganathan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894389#action_12894389 ] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

I'll attach a fix with all the suggested changes.
- SO.transferSSTablesForRequest does enqueue files, but not for transfer, it waits on the requesting node to get a file at a time. I think moving the remove to FileStreamTask will put it into one location, instead of StreamRequestVerbHandler and StreamOut. Was that what you had in mind for it?
- For the streaming metadata, we can consolidate StreamRequestMetadata & StreamRequestMessage. The PendingFile contains specific file metadata, StreamContext is used for all stream related messages. StreamHeader is only used with streaming. I'll see how much I can simplify with the first fix and we can proceed from there.

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1189) Refactor streaming

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896363#action_12896363 ] 

Hudson commented on CASSANDRA-1189:
-----------------------------------

Integrated in Cassandra #510 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/510/])
    streaming changes. removed and combined several messages. fixed out-of-order bug. introduce stream sessions, stream headers. patch by rnirmal, reviewed by gdusbabek. CASSANDRA-1189


> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7 beta 1
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: 0001-Refactored-streaming-to-make-it-more-streamlined.patch, 0002-Test-cases-for-Streaming-Messages.patch, 0003-1189-Fixes-v1.patch
>
>
> The current architecture is buggy because it makes the assumption that only one stream can be in process between two nodes at a given time, and stream send order never changes.  Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.