You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vijay (Created) (JIRA)" <ji...@apache.org> on 2012/01/10 23:04:39 UTC

[jira] [Created] (CASSANDRA-3721) Staggering repair

Staggering repair
-----------------

                 Key: CASSANDRA-3721
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 1.1
            Reporter: Vijay
            Assignee: Vijay
            Priority: Minor


Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).

Sequence:
1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
2) Send Validation on one node at a time (once completed will release references).
3) Hold the reference of the tree in the requesting node and once everything is complete start diff.

We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191146#comment-13191146 ] 

Sylvain Lebresne commented on CASSANDRA-3721:
---------------------------------------------

I did a quick pass on the patches. It seems to me that the refactoring of AntiEntropyService this patch does is largely orthogonal to the issue at hand.  All that seem needed for this issue is to allow sending treeRequest one after the other. But it should be doable with 2 lines in RepairJob.addTree(), and maybe a few more lines to send the snapshot commands. This would have the advantage of making it clear that the patch isn't breaking anything.

I am not saying that the AntiEntropyService synchronization code is the cleanest one we have, and maybe a refactoring could improve that. I'm not necessarily convinced such refactoring is necessary at this point, but if you care enough about it, I'm not strongly against it either, but I want to point out that making that refactoring as part of this ticket almost surely make this out of reach for 1.1 (as it'll make review more complicated and make it unreasonable to shove this in a handful of days before the freeze).

As a side note, I spotted 2 changes that seems gratuitous without seemingly improving the code:
* In TreeRequestVerbHandler.doVerb, you renamed the variables. However I think the new name, cloneRequest, is misleading as we're not really doing a clone.
* Is there a reason to change RepairFuture to not be a Future anymore? Even if we don't really use it, it can be convenient to have it implement the native Future interface, especially given it's called RepairFuture.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183715#comment-13183715 ] 

Vijay commented on CASSANDRA-3721:
----------------------------------

Will do, will add something like nt repair withsnapshot
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3721:
----------------------------------------

    Attachment: 3721.patch

Looking at the global patch for this, there's a few things I'm not totally fan with the DistributedJob approach:
* I think it tries to generalize too much, making DistributedJob hard to follow in itself. Typically, why wouldn't the parallel case of DJ not use the initRequest method? Yes, technically that's because it is used to do send snapshot commands for RepairJob only in the sequential case, but that makes for a poor abstraction imho. Another "proof" of that is the fact that DifferencingJob actually don't use half of the features DJ is trying to abstract.
* As said earlier, it changes more code that we really need to, including changing completely how repair synchronization is done. Given that I'm not sure it really improves things, I'd prefer avoiding that if only for the sake of having less chance to introducing bugs.
* I believe the differences between the sequential and parallel path would be easier to follow using sub-classing. That may be a personal preference though.

Attaching a version that tries to abstract the sequential vs parallel request business but only that. The rest of the patch is roughly the same as the initial patch except that it's rebased.

                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208268#comment-13208268 ] 

Sylvain Lebresne commented on CASSANDRA-3721:
---------------------------------------------

I think there have been a misunderstanding. I've attached 3721.patch that was addressing my concerns with the initial 0001-staggering-repair-with-snapshot.patch. I'm personally good with 3721.patch (It may be that my poor wording suggested otherwise, sorry if that's the case), except that it needs review of course.

Vijay did commit 3721.patch, so I think that was right (though maybe Vijay you could have made it more clear in your comment that you did review the last version, +1ed it and thus committed it). 
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192272#comment-13192272 ] 

Sylvain Lebresne commented on CASSANDRA-3721:
---------------------------------------------

What I suggest for now is to separate the snapshotting parts from this patch (we can even spawn a new ticket) because it changes the network protocol, so if we don't get it for 1.1, we'll kind of have to wait for 1.2 with our current rule of not changing the protocol version during a major cycle. I won't have time to look at the rest for the 1.1 freeze, but we can get it for 1.1.1. 
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3721:
-----------------------------

    Attachment: 0001-add-snapshot-command.patch
    
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191245#comment-13191245 ] 

Vijay commented on CASSANDRA-3721:
----------------------------------

>>>  But it should be doable with 2 lines in RepairJob.addTree(), and maybe a few more lines to send the snapshot commands
the problem is that we have to have to implement the same thing which is done in DistributedJob(found in the attached patch) the reason being we have to wait for the job to complete in the remote server, so we might want to wait for a simplecondition and then create a condition for every request sent or callback needs to do the next job (special for snapshot repair).
+ we have to do the same thing which we did for sendTree for the Diffrencing because it has performStreamingRepair(). 
+ we have to also clear the snapshot if it fails.
+ I thought of implementing CASSANDRA-3486 after this which will benefit from this refactor too.

Do you think it is worth doing a simple patch in the lines of what you have mentioned for 1.1 and keep the refactor for 1.2?

>>> I spotted 2 changes that seems gratuitous
Those where unintentional i should have checked it before submitting i will fix that.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3721:
-----------------------------

    Attachment: 0001-staggering-repair-with-snapshot.patch

This patch will stagger repairs if -snapshot option is used.

nt repair -snapshot -pr &
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192982#comment-13192982 ] 

Sylvain Lebresne commented on CASSANDRA-3721:
---------------------------------------------

Ok, +1 on part 1 (add-snapshot-command). I've committed it. I'll look more closely at the rest soonish.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-3721) Staggering repair

Posted by "Jonathan Ellis (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-3721:
---------------------------------------


Reverted -- Sylvain said this patch only addressed one of his major concerns.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3721:
-----------------------------

    Attachment: 0001-add-snapshot-command.patch

Hi Sylvain, Plz see the attached this patch will add the command. Ideally this patch should not break the backward compatibility because it just adds a command...
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3721:
-----------------------------

    Reviewer: slebresne
    
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183652#comment-13183652 ] 

Jonathan Ellis commented on CASSANDRA-3721:
-------------------------------------------

I think you'd basically need to repair against a snapshot, or you come back to the pre-CASSANDRA-2816 bad old days.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay resolved CASSANDRA-3721.
------------------------------

    Resolution: Fixed

Sorry for the confusion, +1 for me and i committed it again... I did test it and unit test passes. Thanks!
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch, 3721.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3721) Staggering repair

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3721:
-----------------------------

    Attachment:     (was: 0001-add-snapshot-command.patch)
    
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-add-snapshot-command.patch, 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira