You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paul Nickerson <pg...@gmail.com> on 2015/02/10 19:34:24 UTC

Re: Repairing OpsCenter rollups60 Results in Snapshot Errors

Thank you Reynald. I have contributed to that issue. But I cannot
participate further right now because now I'm having an out of memory issue
which may be unrelated. I think I'll start a new thread on this list for
that.


 ~ Paul Nickerson

On Thu, Jan 29, 2015 at 11:15 AM, Reynald Bourtembourg <
reynald.bourtembourg@esrf.fr> wrote:

>  Hi Paul,
>
> There is a JIRA ticket about this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-8696
>
> I have seen these errors too the last time I ran "nodetool repair".
> I would also be interested to know the answer to the questions you were
> asking:
>
> "Are these errors problematic? Should I just let the repair process
> continue for however long it takes? "
> "I am wondering whether this is making the repair ineffectual."
>
> Best regards
>
> Reynald
>
>
>
> On 29/01/2015 17:03, Paul Nickerson wrote:
>
>  I am running a 6 node cluster using Apache Cassandra 2.1.2 with DataStax
> OpsCenter 5.0.2 from the AWS EC2 AMI "DataStax Auto-Clustering AMI
> 2.5.1-hvm" (DataStax Community AMI). When I try to run a repair on the
> rollups60 column family in the OpsCenter keyspace, I get errors about
> failed snapshot creation in the Cassandra system log. The repair seems to
> continue, and then finishes with errors.
>
>  I am wondering whether this is making the repair ineffectual.
>
>  I am running the command
>
>      nodetool repair OpsCenter rollups60
>
>  on one of the nodes (10.63.74.70). From the command, I get this output:
>
>      [2015-01-23 19:36:06,261] Starting repair command #9, repairing 511
> ranges for keyspace OpsCenter (seq=true, full=true)
>     [2015-01-23 21:08:16,242] Repair session
> 67772db0-a337-11e4-9e78-37e5027a626b for range
> (5848435723460298978,5868916338423419522] failed with error
> java.io.IOException: Failed during snapshot creation.
>
>  The error is repeated many times, and they all appear right at the end.
> Here is an example of what I see in the log on that same system (the one
> that I'm running the command from, and the one that's trying to snapshot):
>
>      INFO  [AntiEntropyStage:1] 2015-01-23 19:38:28,235
> RepairSession.java:171 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b]
> Received merkle tree for rollups60 from /10.63.74.70
>     INFO  [AntiEntropySessions:9] 2015-01-23 19:38:28,236
> RepairSession.java:260 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] new
> session: will sync /10.63.74.70, /10.51.180.16 on range
> (5848435723460298978,5868916338423419522] for OpsCenter.[rollups60]
>     INFO  [RepairJobTask:3] 2015-01-23 19:38:28,237 Differencer.java:74 -
> [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Endpoints /10.13.157.190
> and /10.63.74.70 have 1 range(s) out of sync for rollups60
>     INFO  [AntiEntropyStage:1] 2015-01-23 19:38:28,237
> ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%)
> on-heap, 0 (0%) off-heap
>     INFO  [MemtableFlushWriter:25] 2015-01-23 19:38:28,238
> Memtable.java:325 - Writing Memtable-rollups60@204861223(51960 serialized
> bytes, 1395 ops, 0%/0% of on/off-heap limit)
>     INFO  [RepairJobTask:3] 2015-01-23 19:38:28,239
> StreamingRepairTask.java:68 - [streaming task
> #138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of 1
> ranges with /10.13.157.190
>     INFO  [MemtableFlushWriter:25] 2015-01-23 19:38:28,262
> Memtable.java:364 - Completed flushing
> /raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db
> (29998 bytes) for commitlog position
> ReplayPosition(segmentId=1422038939094, position=31047766)
>     ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127 -
> Error occurred during snapshot phase
>     java.lang.RuntimeException: Could not create snapshot at /10.63.74.70
>             at
> org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> ~[na:1.7.0_51]
>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ~[na:1.7.0_51]
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_51]
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_51]
>             at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
>     INFO  [AntiEntropySessions:10] 2015-01-23 19:38:39,068
> RepairSession.java:260 - [repair #6dec29c0-a337-11e4-9e78-37e5027a626b] new
> session: will sync /10.63.74.70, /10.51.180.16 on range
> (-6918744323658665195,-6916171087863528821] for OpsCenter.[rollups60]
>     ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068
> RepairSession.java:303 - [repair #67772db0-a337-11e4-9e78-37e5027a626b]
> session completed with the following error
>     java.io.IOException: Failed during snapshot creation.
>             at
> org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
> ~[guava-16.0.jar:na]
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_51]
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_51]
>             at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
>     ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070
> CassandraDaemon.java:153 - Exception in thread
> Thread[AntiEntropySessions:9,5,RMI Runtime]
>     java.lang.RuntimeException: java.io.IOException: Failed during
> snapshot creation.
>             at
> com.google.common.base.Throwables.propagate(Throwables.java:160)
> ~[guava-16.0.jar:na]
>             at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> ~[na:1.7.0_51]
>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ~[na:1.7.0_51]
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_51]
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_51]
>             at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
>     Caused by: java.io.IOException: Failed during snapshot creation.
>             at
> org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>             at
> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
> ~[guava-16.0.jar:na]
>             ... 3 common frames omitted
>
>  The errors are repeated many times. The IP Address 10.63.74.70 in the
> log is the node I'm running the repair from. I am able to repair the rest
> of the OpsCenter column families, and they complete pretty quickly without
> error.
>
>  I have tried creating my own snapshot, and it completes successfully
> with nothing logged.
>
>      nodetool snapshot OpsCenter
>
>  The disk has plenty of space left. Are these errors problematic? Should
> I just let the repair process continue for however long it takes? The
> cluster is currently not in use by any application, yet it has some load
> while it's trying this repair, so it's not sitting idle (it has no load
> when I'm not repairing).
>
>  Thanks for any help.
>
>  And if this is the wrong place to ask about a DataStax Community thing,
> could someone point me in the right direction?
>
>   ~ Paul Nickerson
>
>
>