You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Peter Schuller (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 08:27:59 UTC

[jira] [Created] (CASSANDRA-3912) support incremental repair controlled by external agent

support incremental repair controlled by external agent
-------------------------------------------------------

Key: CASSANDRA-3912
URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Peter Schuller
Assignee: Peter Schuller
Attachments: CASSANDRA-3912-trunk-v1.txt

As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).

Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:

{code}
nodetool repairincremental 0 100
nodetool repairincremental 1 100
...
nodetool repairincremental 99 100
{code}

An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.

The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.

An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.

Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Peter Schuller (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209117#comment-13209117 ] 

Peter Schuller edited comment on CASSANDRA-3912 at 2/16/12 5:22 AM:
--------------------------------------------------------------------

Agreed.

The good news is that the actual commands necessary ({{getprimaryrange}} and {{repairrange}}) are easy patches.

The bad news is that it turns out the AntiEntropyService does not support arbitrary ranges.

Attaching {{CASSANDRA\-3912\-v2\-001\-add\-nodetool\-commands.txt}} and {{CASSANDRA\-3912\-v2\-002\-fix\-antientropyservice.txt}}.

Had it not been for AES I'd want to propose we commit this to 1.1 since it would be additive only, but given the AES fix I don't know... I guess probably not?

It's a shame because I think it would be a boon to users with large nodes struggling with repair (despite the fact that, as you point out, each repair implies a flush).


                
      was (Author: scode):
    Agreed.

The good news is that the actual commands necessary ({{getprimaryrange}} and {{repairrange}}) are easy patches.

The bad news is that it turns out the AntiEntropyService does not support arbitrary ranges.

Attaching {{CASSANDRA\-3912\-v2\-001\-add\-nodetool\-commands.txt}} and {{CASSANDRA-3912-v2-002-fix-antientropyservice.txt}}.

Had it not been for AES I'd want to propose we commit this to 1.1 since it would be additive only, but given the AES fix I don't know... I guess probably not?

It's a shame because I think it would be a boon to users with large nodes struggling with repair (despite the fact that, as you point out, each repair implies a flush).


                  
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Stu Hood (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254391#comment-13254391 ] 

Stu Hood commented on CASSANDRA-3912:
-------------------------------------

bq. else if (range.intersects(toRepair))
The error message here could be a bit better: the solution is for the user to look at the cluster layout, and use exact tokens, right? Also, it should be possible to repair a range that falls on the boundary of two getLocalRanges, assuming it can be fully contained in their aggregate: So it would be possible to merge all local ranges into one big range, which would cause your {{if (range.contains(toRepair))}} check to succeed more frequently. Probably not worth it for this patch though, unless Peter needs it for something.

bq. For JMX sakes
For JMX's sake.

bq. starting user-requested repair of range
Would including the 'future.session.getName' in this log message be useful, in order to link the uuid of the repair to the message?

----

Looks like a good improvement to me. If Peter's happy, then +1 from me.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239291#comment-13239291 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
---------------------------------------------

bq. What if we dropped our philosophy that client timestamps aren't necessarily timestamps? I can't think of anyone actually using non-clock-based timestamps. Then we could usefully compare server time to data timestamps and create a tree of "data inserted before repair beginning."

That wouldn't fully solves it because in-memtable resolution is lossy. In other words, even if timestamps are clock-based, you cannot get correct snapshot of in-memory data which is what we would need.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Attachment: 3912_v2.txt

Attaching a new version (dubbed v2) of this. This changed the logic of getNeighbors a bit so that:
# it returns no node if the provided range is not part of the local node ownership. Currently, repair has been wrote with the assumption that the repaired range was a local one. It could be that the code is able to cope with the range not being a local one (I haven't checked) but I'd prefer avoiding the risk of foot-shooting currently.
# it only allows range that are contained in a local range. Which excludes ramge that overwrap 2 local ranges. I've explained the reason above: repair will repair data that doesn't need to be otherwise.

Those are not really limiting anyway, since the correct way to use this is to call repair on subsets of the primary range.

That patch also exposes forceTableRepairRange and getPrimaryRange (modified from the preceding patch to return a list of String instead of a map entry just because that's what other JMX function do) on JMX. The patch does not include the nodetool counterpart. The vague reason is that since using this does require some understanding of what's going on and to write some sufficiently smart external tooling to work correctly, it feels like exposing through JMX should be enough and exposing through nodetool may add more confusion than benefits. I don't stand very strongly on that though so I could be convinced otherwise if others feel differently.

I'll also note that I mentioned the flush problem for information/reminder purposes only. I think fixing this one problem is beyond the scope of this issue (which really just expose what already exists internally).


                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254729#comment-13254729 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
---------------------------------------------

bq.  the solution is for the user to look at the cluster layout, and use exact tokens, right?

Well, kinda. The user should look at the layout, grab one of the range of the node, and then submit repair on a subset of that range. I fully agree this is not for the faint of heart (which is why I prefer not exposing it to nodetool just yet), but as it stand I'll admit I'm not sure how to improve that error message much.

bq. it should be possible to repair a range that falls on the boundary of two getLocalRanges, assuming it can be fully contained in their aggregate

Actually no. Or more precisely, in general it still have the problem mentioned above. Given 2 local ranges, there will be some neighbors that share one but not both of those ranges (so with these nodes the repair would be imprecise). In other words, to repair a range on the boundary of two local ranges, you'd really want to do 2 repairs on each subrange, because each time the set of neighbors will be different (we could do that splitting at the StorageService level but we should probably keep it simple for now. I see this ticket as just exposing existing code to advanced user, not reinventing repair).

bq. For JMX's sake.

Fixed :) (I've rebased the patch)

bq. Would including the 'future.session.getName' in this log message be useful

That's logged before the session is created.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236331#comment-13236331 ] 

Jonathan Ellis commented on CASSANDRA-3912:
-------------------------------------------

bq. As a side note, I'll remark that every repair of a range triggers a flush, so one should probably be careful to not repair incrementally on too small a range.

Is it worth evaluating using the range scan code to compute the trees instead of an sstable-only scanner?  That would let us avoid the flush.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Chris Goffinet (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-3912:
--------------------------------------

    Fix Version/s: 1.2
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Stu Hood (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238207#comment-13238207 ] 

Stu Hood commented on CASSANDRA-3912:
-------------------------------------

Didn't get a chance to review this this weekend: sorry! Hopefully Monday night.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Fix Version/s:     (was: 1.2)
                   1.1.1

Moving fix version to 1.1.1 since this is a fairly trivial/non-disrupting patch.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.1.1
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209222#comment-13209222 ] 

Peter Schuller commented on CASSANDRA-3912:
-------------------------------------------

Instantiating a {{RangeStreamer}} is not entirely clean IMO, but for the moment I settled with that. I'll probably revisit and suggest a static utility or possibly even move it into {{TokenMetadata}} (probably not). Maybe just making it a static public in {{RangeStreamer}} is enough for now.

                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209118#comment-13209118 ] 

Peter Schuller commented on CASSANDRA-3912:
-------------------------------------------

For the record the second patch sneaks in a renaming of AES.getNeighbors() to AES.getSources(). Since it does filtering, it is in fact not returning all neighbors, as the old name implied (well, to me anyway).
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Attachment: 3912_v2.txt
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Attachment:     (was: 3912_v2.txt)
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208387#comment-13208387 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
---------------------------------------------

I don't think that on the server side we should do more than expose the forceTableRepair method taking the Range in argument (and I do think there is value in doing that). The user friendly transformation of steps to range could probably be left to external scripts imo, but at least it should be moved to nodetool itself if we want it in.

As a side note, I'll remark that every repair of a range triggers a flush, so one should probably be careful to not repair incrementally on too small a range.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3912:
--------------------------------------

        Labels: jmx  (was: )
    Issue Type: New Feature  (was: Improvement)
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>              Labels: jmx
>             Fix For: 1.1.1
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Stu Hood (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238878#comment-13238878 ] 

Stu Hood commented on CASSANDRA-3912:
-------------------------------------

Hey Peter: would you mind rebasing this? There are a few new parameters to StorageService internal methods that I don't feel comfortable filling in.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Attachment:     (was: 3912_v2.txt)
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Peter Schuller (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-3912:
--------------------------------------

    Attachment: CASSANDRA-3912-v2-002-fix-antientropyservice.txt
                CASSANDRA-3912-v2-001-add-nodetool-commands.txt

Agreed.

The good news is that the actual commands necessary ({{getprimaryrange}} and {{repairrange}}) are easy patches.

The bad news is that it turns out the AntiEntropyService does not support arbitrary ranges.

Attaching {{CASSANDRA\-3912\-v2\-001\-add\-nodetool\-commands.txt}} and {{CASSANDRA-3912-v2-002-fix-antientropyservice.txt}}.

Had it not been for AES I'd want to propose we commit this to 1.1 since it would be additive only, but given the AES fix I don't know... I guess probably not?

It's a shame because I think it would be a boon to users with large nodes struggling with repair (despite the fact that, as you point out, each repair implies a flush).


                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Stu Hood (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255245#comment-13255245 ] 

Stu Hood commented on CASSANDRA-3912:
-------------------------------------

#shipit from me, assuming Peter is happy.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239445#comment-13239445 ] 

Jonathan Ellis commented on CASSANDRA-3912:
-------------------------------------------

True, but if we're doing smaller chunks of data we may be able to tolerate some imprecision.  We already have a little, since it's quite unlikely that snapshots of actively-updated CFs across multiple replicas contain *exactly* the same updates.
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236448#comment-13236448 ] 

Sylvain Lebresne commented on CASSANDRA-3912:
---------------------------------------------

Note that allowing the repair of any range can actually be problematic, what we should do is allow only repair of a range that is fully contained in a replica range.
Consider 2 nodes A and B, where A is replica  for range (0, 100] and (100, 200] and B for range (100, 200] and (200, 300]. And say we ask to repair the range (50, 150]. The problem is that the merkle tree for that repair may end up hashing on the range (95, 105], but on that range, A and B cannot have the same hash (because A will have data on (95, 100] but B won't). It's not a huge deal, but there is no reason to allow user to shoot themselves in the foot.
I see 2 options:
* we simply throw an IllegalArgumentException if the user provide a range that is not fully contained in a replica range.
* we accept such range, but split it and trigger multiple repairs
My personal preference goes to the first option.

bq. Is it worth evaluating using the range scan code to compute the trees instead of an sstable-only scanner?

The problem is that for repair to not be overly inefficient, we need to make sure we compare "snapshots" of the data taken at "the same time". The flushing ensures that (It's obviously not exactly "at the same time", but it's as good as we can get it). I suppose you could say that if you repair very small ranges at a time, that problem would be minimized, and maybe that is true in practice, but that still feel fairly fragile to me (and would require very careful testing to make sure this does is acceptable in practice).
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236607#comment-13236607 ] 

Jonathan Ellis commented on CASSANDRA-3912:
-------------------------------------------

bq. The flushing ensures that (It's obviously not exactly "at the same time", but it's as good as we can get it)

What if we dropped our philosophy that client timestamps aren't necessarily timestamps?  I can't think of anyone actually using non-clock-based timestamps.  Then we could usefully compare server time to data timestamps and create a tree of "data inserted before repair beginning."
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3912:
----------------------------------------

    Attachment: 3912_v2.txt
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: 3912_v2.txt, CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Peter Schuller (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-3912:
--------------------------------------

    Attachment: CASSANDRA-3912-trunk-v1.txt
    
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: CASSANDRA-3912-trunk-v1.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3912) support incremental repair controlled by external agent

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3912:
--------------------------------------

    Reviewer: stuhood

Stu, could you review?
                
> support incremental repair controlled by external agent
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3912
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3912-trunk-v1.txt, CASSANDRA-3912-v2-001-add-nodetool-commands.txt, CASSANDRA-3912-v2-002-fix-antientropyservice.txt
>
>
> As a poor man's pre-cursor to CASSANDRA-2699, exposing the ability to repair small parts of a range is extremely useful because it allows (with external scripting logic) to slowly repair a node's content over time. Other than avoiding the bulkyness of complete repairs, it means that you can safely do repairs even if you absolutely cannot afford e.g. disk spaces spikes (see CASSANDRA-2699 for what the issues are).
> Attaching a patch that exposes a "repairincremental" command to nodetool, where you specify a step and the number of total steps. Incrementally performing a repair in 100 steps, for example, would be done by:
> {code}
> nodetool repairincremental 0 100
> nodetool repairincremental 1 100
> ...
> nodetool repairincremental 99 100
> {code}
> An external script can be used to keep track of what has been repaired and when. This should allow (1) allow incremental repair to happen now/soon, and (2) allow experimentation and evaluation for an implementation of CASSANDRA-2699 which I still think is a good idea. This patch does nothing to help the average deployment, but at least makes incremental repair possible given sufficient effort spent on external scripting.
> The big "no-no" about the patch is that it is entirely specific to RandomPartitioner and BigIntegerToken. If someone can suggest a way to implement this command generically using the Range/Token abstractions, I'd be happy to hear suggestions.
> An alternative would be to provide a nodetool command that allows you to simply specify the specific token ranges on the command line. It makes using it a bit more difficult, but would mean that it works for any partitioner and token type.
> Unless someone can suggest a better way to do this, I think I'll provide a patch that does this. I'm still leaning towards supporting the simple "step N out of M" form though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira