You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/05/05 13:08:03 UTC

[jira] [Created] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Have the repair of a range repair *all* the replica for that range
------------------------------------------------------------------

                 Key: CASSANDRA-2610
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.8 beta 1
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor
             Fix For: 0.8.1


Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.

It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).

However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Omid Aladini (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478001#comment-13478001 ] 

Omid Aladini commented on CASSANDRA-2610:
-----------------------------------------

This indeed makes repair across a cluster easier to manage, specially together with -pr (CASSANDRA-2606), but the downside is all replica for a range would be affected once the data is streamed. In my case repair transfers huge amount of data each time (possibly due to Merkle tree precision CASSANDRA-2698) causing hundreds of pending compactions that affects reads and counter-writes for the affected range. I'd prefer to have cassandra calculate Merkle trees multiple times (which is possible to throttle) and have faster quorum reads when only one replica is slowed down. Given that incremental repair (CASSANDRA-2699) is still in progress, do you think it makes sense to make repair-on-all-replica optional? Possibly via a flag on the node that the repair is run?
                
> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: 0001-Make-repair-repair-all-hosts.patch, 0001-Make-repair-repair-all-hosts-v2.patch, 0002-Cleanup-log-messages-v2.patch, 0003-cleanup-and-fix-private-reference.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095916#comment-13095916 ] 

Hudson commented on CASSANDRA-2610:
-----------------------------------

Integrated in Cassandra #1067 (See [https://builds.apache.org/job/Cassandra/1067/])
    Make repair of a range sync all replica pairs for this range
patch by slebresne; reviewed by jbellis for CASSANDRA-2610

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164463
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamingRepairTask.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/UUIDGen.java
* /cassandra/trunk/test/unit/org/apache/cassandra/io/CompactSerializerTest.java
* /cassandra/trunk/test/unit/org/apache/cassandra/service/AntiEntropyServiceTestAbstract.java


> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 0001-Make-repair-repair-all-hosts-v2.patch, 0001-Make-repair-repair-all-hosts.patch, 0002-Cleanup-log-messages-v2.patch, 0003-cleanup-and-fix-private-reference.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2610:
--------------------------------------

    Attachment:     (was: 0003-cleanup-and-fix-private-reference.patch)

> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 0001-Make-repair-repair-all-hosts-v2.patch, 0001-Make-repair-repair-all-hosts.patch, 0002-Cleanup-log-messages-v2.patch, 0003-cleanup-and-fix-private-reference.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029381#comment-13029381 ] 

Jonathan Ellis commented on CASSANDRA-2610:
-------------------------------------------

I thought it already works like this (because when you repair A, you'll often see additional data stream to B, C). So +1 for making it conform to my mental picture. :)

> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 0.8.1
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Omid Aladini (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478231#comment-13478231 ] 

Omid Aladini commented on CASSANDRA-2610:
-----------------------------------------

That's great, didn't know about that. This means by throttling the sequentialized streaming, I might be able to let pending compactions resolve on the previously-repaired replica, although tuning this would be a challenge (possibly by dynamically changing streamthroughput according to pending compactions, although doesn't seem ideal).
                
> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: 0001-Make-repair-repair-all-hosts.patch, 0001-Make-repair-repair-all-hosts-v2.patch, 0002-Cleanup-log-messages-v2.patch, 0003-cleanup-and-fix-private-reference.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2610) Have the repair of a range repair all the replica for that range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2610:
--------------------------------------

    Attachment: 0003-cleanup-and-fix-private-reference.patch

03 fixes a reference to private UUIDGen.instance field, and cleans up RepairJob to (1) use concurrent structures instead of synchronized and (2) track Differencer objects directly instead of using Pair<InetAddress, InetAddress> as proxies.  Also adds some comments.

otherwise, lgtm.

> Have the repair of a range repair *all* the replica for that range
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2610
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: 0001-Make-repair-repair-all-hosts-v2.patch, 0001-Make-repair-repair-all-hosts.patch, 0002-Cleanup-log-messages-v2.patch, 0003-cleanup-and-fix-private-reference.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Say you have a range R whose replica for that range are A, B and C. If you run repair on node A for that range R, when the repair end you only know that A is fully repaired. B and C are not. That is B and C are up to date with A before the repair, but are not up to date with one another.
> It makes it a pain to schedule "optimal" cluster repairs, that is repairing a full cluster without doing work twice (because you would have still have to run a repair on B or C, which will make A, B and C redo a validation compaction on R, and with more replica it's even more annoying).
> However it is fairly easy during the first repair on A to have him compare all the merkle trees, i.e the ones for B and C, and ask to B or C to stream between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira