You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brent (JIRA)" <ji...@apache.org> on 2019/04/23 12:33:00 UTC
[jira] [Comment Edited] (CASSANDRA-12860) Nodetool repair fragile: cannot properly recover from single node failure. Has to restart all nodes in order to repair again

    [ https://issues.apache.org/jira/browse/CASSANDRA-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824071#comment-16824071 ] 

Brent edited comment on CASSANDRA-12860 at 4/23/19 12:32 PM:
-------------------------------------------------------------

I have the same issue on 3.11.4.

Repairing a small keyspace (about 1GB) with the command:
{code:java}
nodetool -full -tr metrics
{code}
OR with cassandra-reaper (on any performance setting)

causes one or more nodes to restart. The trace output of nodetool just says streaming failed.

Note that all the nodes (3 in each dc with two dc's) have 2 cores and 4GB ram (1GB heap space according to formula)


was (Author: brentc):
I have the same issue on 3.11.4.

Repairing a small keyspace (about 1GB) with the command:
{code:java}
nodetool -full -tr metrics
{code}
OR with cassandra-reaper (on any performance setting)

causes one or more nodes to restart. The trace output of nodetool just says streaming failed.

> Nodetool repair fragile: cannot properly recover from single node failure. Has to restart all nodes in order to repair again
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12860
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12860
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: CentOS 6.7, Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode), Cassandra 3.5.0, fresh install
>            Reporter: Bing Wu
>            Priority: Urgent
>
> Summary of symptom:
> - Set up is a multi-region cluster in AWS (5 regions). Each region has at least 4 hosts with RF=1/2 number of nodes, using V-nodes (256)
> - How to reproduce:
> -- On node A, start this repair job (again we are running fresh 3.5.0): {code}nohup sudo nodetool repair -j 2 -pr -full myks > /tmp/repair.log 2>&1 &{code}
> -- Job starts fine, reporting progress like {noformat}
> [2016-10-28 22:37:52,692] Starting repair command #1, repairing keyspace myks with repair options (parallelism: parallel, primary range: true, incremental: false, job threads: 2, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 256)
> [2016-10-28 22:38:35,099] Repair session 36f13450-9d5f-11e6-8bf7-a9f47ff986a9 for range [(4029874034937227774,4033949979656106020]] finished (progress: 1%)
> [2016-10-28 22:38:38,769] Repair session 36f30910-9d5f-11e6-8bf7-a9f47ff986a9 for range [(-2395606719402271267,-2394525508513518837]] finished (progress: 1%)
> [2016-10-28 22:38:48,521] Repair session 36f3f370-9d5f-11e6-8bf7-a9f47ff986a9 for range [(-5223108861718702793,-5221117649630514419]] finished (progress: 2%)
> {noformat}
> -- Then manually shutdown another node (node B) in the same region (haven't tried with other region yet but expect the same behavior from past experience)
> -- Shortly after that seeing this message from job log (as well as in system.log) on node A: {noformat}
> [2016-10-28 22:41:46,268] Repair session 37088ce1-9d5f-11e6-8bf7-a9f47ff986a9 for range [(-928974038666914990,-927967994563261540]] failed with error Endpoint /node_B_ip died (progress: 51%)
> {noformat}
> -- From this point on, repair job seems to hang:
> --- no further messages from job log
> --- nor any related messages in system.log
> --- CPU stayed low (low single digit percent of 1 CPU)
> -- After an hour (1hr), manually kill the repair jobs using "ps -eaf | grep repair"
> -- Restart C* on node A
> --- Verified system is up and no error messages in system.log
> --- Also verified that there is no error messages from node B
> -- After node A settles down (e.g. no new messages from system.log), restart the same repair job: {code}nohup sudo nodetool repair -j 2 -pr -full myks > /tmp/repair.log 2>&1 &{code}
> -- Job failes pretty quickly, reporting error from more nodes B and K: {noformat} <production>[ywu@cass-tm-1b-012.apse1.mashery.com ~]$ tail -f /tmp/repair.log 
> [2016-10-28 22:49:52,965] Starting repair command #1, repairing keyspace myks with repair options (parallelism: parallel, primary range: true, incremental: false, job threads: 2, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 256)
> [2016-10-28 22:50:15,839] Repair session e4180720-9d60-11e6-b2f9-cb9524b3c536 for range [(4029874034937227774,4033949979656106020]] failed with error [repair #e4180720-9d60-11e6-b2f9-cb9524b3c536 on myks/rtable, [(4029874034937227774,4033949979656106020]]] Validation failed in /node_K_ip (progress: 1%)
> [2016-10-28 22:50:17,158] Repair session e419dbe0-9d60-11e6-b2f9-cb9524b3c536 for range [(-2395606719402271267,-2394525508513518837]] failed with error [repair #e419dbe0-9d60-11e6-b2f9-cb9524b3c536 on myks/rtable, [(-2395606719402271267,-2394525508513518837]]] Validation failed in /node_B_ip (progress: 1%)
> [2016-10-28 22:50:18,256] Repair session e41b1460-9d60-11e6-b2f9-cb9524b3c536 for range [(-5223108861718702793,-5221117649630514419]] failed with error [repair #e41b1460-9d60-11e6-b2f9-cb9524b3c536 on myks/rtable, [(-5223108861718702793,-5221117649630514419]]] Validation failed in /node_B_ip (progress: 2%)
> {noformat}
> -- On the said nodes (B and K), seeing similar errors: {noformat}
> ERROR [ValidationExecutor:5] 2016-10-28 22:58:45,307 CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables
> ERROR [ValidationExecutor:5] 2016-10-28 22:58:45,307 Validator.java:261 - Failed creating a merkle tree for [repair #14378ec0-9d62-11e6-ab75-cd4d64a01b02 on myks/atable, [(4029874034937227774,4033949979656106020]]], /52.220.127.190 (see log for details)
> INFO  [AntiEntropyStage:1] 2016-10-28 22:58:45,307 Validator.java:274 - [repair #14378ec0-9d62-11e6-ab75-cd4d64a01b02] Sending completed merkle tree to /52.220.127.190 for myks.xtable
> ERROR [ValidationExecutor:5] 2016-10-28 22:58:45,308 CassandraDaemon.java:195 - Exception in thread Thread[ValidationExecutor:5,1,main]
> java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables
>         at org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1321) ~[apache-cassandra-3.5.0.jar:3.5.0]
>         at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1211) ~[apache-cassandra-3.5.0.jar:3.5.0]
>         at org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:81) ~[apache-cassandra-3.5.0.jar:3.5.0]
>         at org.apache.cassandra.db.compaction.CompactionManager$11.call(CompactionManager.java:841) ~[apache-cassandra-3.5.0.jar:3.5.0]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> INFO  [AntiEntropyStage:1] 2016-10-28 22:58:45,318 Validator.java:274 - [repair #14378ec0-9d62-11e6-ab75-cd4d64a01b02] Sending completed merkle tree to /52.220.127.190 for myks.ytable
> {noformat}
> -- At this point, we are back to where we were: kill the repair job on node A, then restart C* on BOTH nodes A and K, but still seeing the same exceptions except sometimes they are on other servers all over the ring.
> - Business impact: I am in the process of launch a Cassandra based production system but I have to hold back now because how fragile repair is. And I am told by many sources that I have to rely on periodical repair jobs to fix data inconsistencies.
> - The only work around was to rolling restart the Cassandra server on ALL nodes in the entire cluster
> -- Then the repair job can proceed without any error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org