You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Kenneth Failbus (JIRA)" <ji...@apache.org> on 2015/05/08 18:49:04 UTC

[jira] [Comment Edited] (CASSANDRA-7317) Repair range validation is too strict

    [ https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534858#comment-14534858 ] 

Kenneth Failbus edited comment on CASSANDRA-7317 at 5/8/15 4:48 PM:
--------------------------------------------------------------------

Folks,

I am seeing this error again in 2.0.9 release. I have vnodes in my cluster enabled.
{code}
2015-05-08 15:01:56,021 [AntiEntropyStage:1] INFO Validator [repair #254edb00-f593-11e4-9397-51babce9f892] Sending completed merkle tree to /10.22.168.35 for CF1/Sequence
2015-05-08 15:01:58,518 [AntiEntropyStage:1] INFO Validator [repair #e3ca16e0-f592-11e4-bce3-6f1b5fa480b1] Sending completed merkle tree to /10.22.168.105 for system_auth/permissions
2015-05-08 15:01:58,791 [AntiEntropyStage:1] INFO Validator [repair #e3ca16e0-f592-11e4-bce3-6f1b5fa480b1] Sending completed merkle tree to /10.22.168.105 for system_auth/credentials
2015-05-08 15:01:58,980 [AntiEntropyStage:1] INFO Validator [repair #e3ca16e0-f592-11e4-bce3-6f1b5fa480b1] Sending completed merkle tree to /10.22.168.105 for system_auth/users
2015-05-08 15:02:00,640 [AntiEntropyStage:1] INFO Validator [repair #e0d31e00-f592-11e4-993e-c9cc22925782] Sending completed merkle tree to /10.22.168.97 for system_auth/credentials
2015-05-08 15:02:01,345 [AntiEntropyStage:1] INFO Validator [repair #e0d31e00-f592-11e4-993e-c9cc22925782] Sending completed merkle tree to /10.22.168.97 for system_auth/users
2015-05-08 15:02:01,577 [AntiEntropyStage:1] INFO Validator [repair #e0d31e00-f592-11e4-993e-c9cc22925782] Sending completed merkle tree to /10.22.168.97 for system_auth/permissions
2015-05-08 15:02:01,753 [AntiEntropyStage:1] INFO Validator [repair #27dba060-f593-11e4-873b-9d346bbba08e] Sending completed merkle tree to /10.22.168.87 for CF1/Sequence
2015-05-08 15:02:02,622 [AntiEntropyStage:1] INFO Validator [repair #dba213a0-f592-11e4-b745-192986bd7af2] Sending completed merkle tree to /10.22.168.117 for system_auth/credentials
2015-05-08 15:02:02,873 [AntiEntropyStage:1] INFO Validator [repair #dba213a0-f592-11e4-b745-192986bd7af2] Sending completed merkle tree to /10.22.168.117 for system_auth/users
2015-05-08 15:02:03,508 [AntiEntropyStage:1] INFO Validator [repair #dba213a0-f592-11e4-b745-192986bd7af2] Sending completed merkle tree to /10.22.168.117 for system_auth/permissions
2015-05-08 15:02:03,988 [AntiEntropyStage:1] INFO Validator [repair #d0a2ad70-f592-11e4-a5a2-b73fe73dbe79] Sending completed merkle tree to /10.22.168.109 for system_auth/credentials
2015-05-08 15:02:04,759 [AntiEntropyStage:1] INFO Validator [repair #d0a2ad70-f592-11e4-a5a2-b73fe73dbe79] Sending completed merkle tree to /10.22.168.109 for system_auth/users
2015-05-08 15:02:05,066 [AntiEntropyStage:1] INFO Validator [repair #d0a2ad70-f592-11e4-a5a2-b73fe73dbe79] Sending completed merkle tree to /10.22.168.109 for system_auth/permissions
2015-05-08 15:02:05,200 [Thread-227856] ERROR StorageService Repair session failed:
java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair
        at org.apache.cassandra.service.ActiveRepairService.getNeighbors(ActiveRepairService.java:161)
        at org.apache.cassandra.repair.RepairSession.<init>(RepairSession.java:130)
        at org.apache.cassandra.repair.RepairSession.<init>(RepairSession.java:119)
        at org.apache.cassandra.service.ActiveRepairService.submitRepairSession(ActiveRepairService.java:97)
        at org.apache.cassandra.service.StorageService.forceKeyspaceRepair(StorageService.java:2628)
        at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2564)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.lang.Thread.run(Thread.java:744)



was (Author: kenfailbus):
Folks,

I am seeing this error again in 2.0.9 release. I have vnodes in my cluster enabled.

> Repair range validation is too strict
> -------------------------------------
>
>                 Key: CASSANDRA-7317
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Nick Bailey
>            Assignee: Yuki Morishita
>             Fix For: 2.0.9
>
>         Attachments: 7317-2.0.txt, Untitled Diagram(1).png
>
>
> From what I can tell the calculation (using the -pr option) and validation of tokens for repairing ranges is broken. Or at least should be improved. Using an example with ccm:
> Nodetool ring:
> {noformat}
> Datacenter: dc1
> ==========
> Address    Rack        Status State   Load            Owns                Token
>                                                                           -10
> 127.0.0.1  r1          Up     Normal  188.96 KB       50.00%              -9223372036854775808
> 127.0.0.2  r1          Up     Normal  194.77 KB       50.00%              -10
> Datacenter: dc2
> ==========
> Address    Rack        Status State   Load            Owns                Token
>                                                                           0
> 127.0.0.4  r1          Up     Normal  160.58 KB       0.00%               -9223372036854775798
> 127.0.0.3  r1          Up     Normal  139.46 KB       0.00%               0
> {noformat}
> Schema:
> {noformat}
> CREATE KEYSPACE system_traces WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'dc2': '2',
>   'dc1': '2'
> };
> {noformat}
> Repair -pr:
> {noformat}
> [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 repair -pr system_traces
> [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for keyspace system_traces
> [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 for range (0,-9223372036854775808] finished
> [2014-05-28 21:36:02,207] Repair command #12 finished
> [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 repair -pr system_traces
> [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for keyspace system_traces
> [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 for range (-9223372036854775798,-10] finished
> [2014-05-28 21:36:14,406] Repair command #1 finished
> {noformat}
> Note that repairing both nodes in dc1, leaves very small ranges unrepaired. For example (-10,0]. Repairing the 'primary range' in dc2 will repair those small ranges. Maybe that is the behavior we want but it seems counterintuitive.
> The behavior when manually trying to repair the full range of 127.0.0.01 definitely needs improvement though.
> Repair command:
> {noformat}
> [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 repair -st -10 -et -9223372036854775808 system_traces
> [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for keyspace system_traces
> [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for keyspace system_traces
> [2014-05-28 21:50:55,804] Repair command #17 finished
> [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
> 1
> {noformat}
> system.log:
> {noformat}
> ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) Repair session failed:
> java.lang.IllegalArgumentException: Requested range intersects a local range but is not fully contained in one; this would lead to imprecise repair
> {noformat}
> * The actual output of the repair command doesn't really indicate that there was an issue. Although the command does return with a non zero exit status.
> * The error here is invisible if you are using the synchronous jmx repair api. It will appear as though the repair completed successfully.
> * Personally, I believe that should be a valid repair command. For the system_traces keyspace, 127.0.0.1 is responsible for this range (and I would argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)