You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2010/06/14 23:47:13 UTC

[jira] Created: (CASSANDRA-1190) Differentiate manual repair sessions from automatic

Differentiate manual repair sessions from automatic
---------------------------------------------------

                 Key: CASSANDRA-1190
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
             Project: Cassandra
          Issue Type: Bug
            Reporter: Stu Hood
            Priority: Critical
             Fix For: 0.6.3, 0.7


Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.

For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.

On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1190:
--------------------------------------

    Fix Version/s: 0.6.4
                       (was: 0.6.3)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.4, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment:     (was: 0002-Rename-readonly-compaction-to-validation-and-make-it.patch)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904013#action_12904013 ] 

Jonathan Ellis commented on CASSANDRA-1190:
-------------------------------------------

Since the automatic repairs couldn't be relied upon, i.e., if you wanted to guarantee repair, you needed to schedule it anyway, I don't see what has really been lost.

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.7.0
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882834#action_12882834 ] 

Hudson commented on CASSANDRA-1190:
-----------------------------------

Integrated in Cassandra #477 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/477/])
    allow multiple repair sessions per node.  patch by Stu Hood; reviewed by jbellis for CASSANDRA-1190


> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment:     (was: 0003-Request-ranges-in-addition-to-sending-them.patch)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment:     (was: 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882599#action_12882599 ] 

Jonathan Ellis commented on CASSANDRA-1190:
-------------------------------------------

committed 0001 to 0.6, but 0002 fails to apply

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Differentiate manual repair sessions from automatic

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878762#action_12878762 ] 

Stu Hood commented on CASSANDRA-1190:
-------------------------------------

Differentiating manual repairs from automatic repairs will require a network protocol change, so for 0.6, I think we should just bump TREE_STORE_TIMEOUT/NATURAL_REPAIR_FREQUENCY to values like 12/24 hours.

For trunk/0.7, it would make sense to remove the timeout entirely for manual repairs, and to make the timeout for automatic repairs a calculated value based on the size of the column family.

> Differentiate manual repair sessions from automatic
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Summary: Remove automatic repair sessions  (was: Differentiate manual repair sessions from automatic)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Comment: was deleted

(was: 0001 through 0003 remove automatic repairs without changing the network format.

0004 adds a session id to the network format to allow for concurrent repairs (considering they can take many hours to complete, and we don't want trees generated at different times to collide).

----

0001 through 0003 could be applied to 0.6, but without a column family argument to StreamIn.requestRanges (see my comment on CASSANDRA-1189), more data will be transferred than necessary.)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Request-ranges-in-addition-to-sending-them.patch, 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Differentiate manual repair sessions from automatic

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879103#action_12879103 ] 

Stu Hood commented on CASSANDRA-1190:
-------------------------------------

Removing automatic repairs is inline with some other improvements I have in mind, so I'm fine with that.

> Differentiate manual repair sessions from automatic
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

        Fix Version/s: 0.7.0
                           (was: 0.7 beta 1)
                           (was: 0.6.3)
    Affects Version/s:     (was: 0.6)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.7.0
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Eric Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Evans updated CASSANDRA-1190:
----------------------------------

    Fix Version/s: 0.6.3
                       (was: 0.6.4)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch
                0002-Rename-readonly-compaction-to-validation-and-make-it.patch
                0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch

0001 and 0002 remove automatic repairs, and should be safe to apply to 0.6.

0003 adds a session id to tree requests/responses, and should only be applied to 0.7.

Should be ready for review now!

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment:     (was: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch)

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch
                0002-Rename-readonly-compaction-to-validation-and-make-it.patch
                0003-Request-ranges-in-addition-to-sending-them.patch

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Request-ranges-in-addition-to-sending-them.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment: for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch
                for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch

Sorry for the delay: 'for-0.6' is rebased to apply to 0.6, and the original set can be applied to trunk.

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.4, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1190) Differentiate manual repair sessions from automatic

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood reassigned CASSANDRA-1190:
-----------------------------------

    Assignee: Stu Hood

> Differentiate manual repair sessions from automatic
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903785#action_12903785 ] 

Stu Hood commented on CASSANDRA-1190:
-------------------------------------

> Removing automatic repairs is inline with some other improvements I have in mind, so I'm fine with that.
The other improvements I had in mind fell through, and I haven't seen any good alternatives to major compactions for repairs. I'm wondering if disabling this was the wisest course of action.

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.3, 0.7 beta 1
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood reopened CASSANDRA-1190:
---------------------------------


> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.3, 0.7 beta 1
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1190:
--------------------------------

    Attachment: 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch

0001 through 0003 remove automatic repairs without changing the network format.

0004 adds a session id to the network format to allow for concurrent repairs (considering they can take many hours to complete, and we don't want trees generated at different times to collide).

----

0001 through 0003 could be applied to 0.6, but without a column family argument to StreamIn.requestRanges (see my comment on CASSANDRA-1189), more data will be transferred than necessary.

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Request-ranges-in-addition-to-sending-them.patch, 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1190) Differentiate manual repair sessions from automatic

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878770#action_12878770 ] 

Jonathan Ellis commented on CASSANDRA-1190:
-------------------------------------------

I think I'd prefer simply removing automatic repairs.  The probability of it happening by chance are low enough that having it happen once in a while is more confusing than useful.

> Differentiate manual repair sessions from automatic
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1190.
---------------------------------------

      Reviewer: jbellis
    Resolution: Fixed

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.7.0
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch, for-0.6-0001-Remove-natural-repair-throttling-in-preparation-for-.patch, for-0.6-0002-Rename-readonly-compaction-to-validation-and-make-it.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1190:
--------------------------------------

    Affects Version/s: 0.6
             Priority: Major  (was: Critical)
          Component/s: Core

reverted 0001, bumping to 0.6.4

> Remove automatic repair sessions
> --------------------------------
>
>                 Key: CASSANDRA-1190
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1190
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.6.4, 0.7
>
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch, 0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch
>
>
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT. This has the very negative effect of setting a maximum time that compaction can take before a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes, to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher. For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and a destination node B with an empty store, then node B needs to wait long enough for node A to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the local tree before node A sends its tree, then the repair will not occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.