You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ruoran Wang (JIRA)" <ji...@apache.org> on 2016/03/10 09:15:41 UTC

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

    [ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188891#comment-15188891 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/10/16 8:15 AM:
-----------------------------------------------------------------

We are running 2.1.13, 1 DC 6 nodes, LCS, replication 3. We've done a full repair on the cluster, and used sstablerepairedset marked all those are repaired.

However, when we run incremental repair, nodetool repair --in-local-dc -par -pr -inc KEYSPACE, we got the same error log from the repairing node, and got the same DecoratedKey from the node that is sending merkle tree to repairing node.
We tried scrub on the failing keyspace/colum_family and restart, (tried on failing node, then tried on all nodes) but we are still occasionally getting the repair failures. So we haven't been able to run incremental repair on our cluster.

{noformat}
ERROR [Thread-46463] 2016-03-06 06:02:34,632 StorageService.java:3050 - Repair session 01e9f1b0-e361-11e5-9531-ffeee0307673 for range (5646258101641427476,5658366818450316790] failed with error org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_66]
        at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3041) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_66]
        ... 1 common frames omitted
Caused by: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.13.jar:2.1.13]
        ... 3 common frames omitted
{noformat}

{noformat}
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,009 Validator.java:245 - Failed creating a merkle tree for [repair #02132fa0-e495-11e5-80cd-61571269f00d on challenges/message_by_modification, (2769065886542373503,2774747608185850009]], /10.57.198.15 (see log for details)
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,011 CassandraDaemon.java:229 - Exception in thread Thread[ValidationExecutor:205,1,main]
java.lang.AssertionError: row DecoratedKey(2769066505137675224, 00040000002e00000800000153441a3ef000) received out of order wrt DecoratedKey(2774747040849866654, 00040000019b0000080000015348847eb200)
        at org.apache.cassandra.repair.Validator.add(Validator.java:126) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:662) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
{noformat}


was (Author: ruoranwang):
We are running 1 DC, 6 nodes, LCS, replication 3. We've done a full repair on the cluster, and used sstablerepairedset marked all those are repaired.

However, when we run incremental repair, nodetool repair --in-local-dc -par -pr -inc KEYSPACE, we got the same error log from the repairing node, and got the same DecoratedKey from the node that is sending merkle tree to repairing node.
We tried scrub on the failing keyspace/colum_family and restart, (tried on failing node, then tried on all nodes) but we are still occasionally getting the repair failures. So we haven't been able to run incremental repair on our cluster.

{noformat}
ERROR [Thread-46463] 2016-03-06 06:02:34,632 StorageService.java:3050 - Repair session 01e9f1b0-e361-11e5-9531-ffeee0307673 for range (5646258101641427476,5658366818450316790] failed with error org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_66]
        at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3041) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_66]
        ... 1 common frames omitted
Caused by: org.apache.cassandra.exceptions.RepairException: [repair #01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, (5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
        at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.13.jar:2.1.13]
        ... 3 common frames omitted
{noformat}

{noformat}
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,009 Validator.java:245 - Failed creating a merkle tree for [repair #02132fa0-e495-11e5-80cd-61571269f00d on challenges/message_by_modification, (2769065886542373503,2774747608185850009]], /10.57.198.15 (see log for details)
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,011 CassandraDaemon.java:229 - Exception in thread Thread[ValidationExecutor:205,1,main]
java.lang.AssertionError: row DecoratedKey(2769066505137675224, 00040000002e00000800000153441a3ef000) received out of order wrt DecoratedKey(2774747040849866654, 00040000019b0000080000015348847eb200)
        at org.apache.cassandra.repair.Validator.add(Validator.java:126) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:662) ~[apache-cassandra-2.1.13.jar:2.1.13]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
{noformat}

> Repair fails with RuntimeException
> ----------------------------------
>
>                 Key: CASSANDRA-9935
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: C* 2.1.8, Debian Wheezy
>            Reporter: mlowicki
>            Assignee: Yuki Morishita
>             Fix For: 2.1.x
>
>         Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
>         at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
>         at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (8063716953988492222,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80]
>         at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
>         at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.8.jar:2.1.8]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_80]        ... 1 common frames omitted
> Caused by: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
>         at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.8.jar:2.1.8]        at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.8.jar:2.1.8]
>         at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.8.jar:2.1.8]        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
>         ... 3 common frames omittedINFO  [Thread-173887] 2015-07-29 20:44:03,854 StorageService.java:2952 - Repair session 846d9300-3608-11e5-a93e-4963524a8bde for range (-6705935
> 742755245856,-6704072966568763453] finished
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)