You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by me...@vrischmann.me on 2016/03/31 12:10:49 UTC

Runtime exception during repair job task

Hi all,
 
Recently we tried to repair one of our biggest table, and we keep
getting hit by errors related to hard link. Here's a stacktrace:
 
ERROR [RepairJobTask:4] 2016-03-31 05:47:27,268 RepairJob.java:145 -
Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.51.0.7
at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(-
SnapshotTask.java:77) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHand-
ler.java:48) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask-
.java:62) ~[apache-cassandra-2.1.5.jar:2.1.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor-
.java:1145) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto-
r.java:615) [na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
ERROR [AntiEntropyStage:39] 2016-03-31 05:47:27,268
CassandraDaemon.java:223 - Exception in thread
Thread[AntiEntropyStage:39,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Tried to hard
link to file that does not exist /data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-
Statistics.db
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMe-
ssageVerbHandler.java:141) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask-
.java:62) ~[apache-cassandra-2.1.5.jar:2.1.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor-
.java:1145) ~[na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto-
r.java:615) ~[na:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]
Caused by: java.lang.RuntimeException: Tried to hard link to file that
does not exist /data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-
Statistics.db
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java-
:90) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableRea-
der.java:1799) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Colum-
nFamilyStore.java:2237) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore-
.java:2319) ~[apache-cassandra-2.1.5.jar:2.1.5]
at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMe-
ssageVerbHandler.java:82) ~[apache-cassandra-2.1.5.jar:2.1.5]
... 4 common frames omitted
 
I tried Googling for that particular error and I did not find a
definitive answer, instead what seems to be recommended is to simply
restart the node. However, we're getting this error at least once a day
and sometimes on multiple nodes (we have 7 nodes currently), so it's
getting tedious to restart cassandra every time.
 
I saw the issue https://issues.apache.org/jira/browse/CASSANDRA-6433
and it suggests it's due to a drop of a keyspace, but we didn't do
any drop. So I'm not sure that issue really applies, although the
error is related.
 
This issue https://issues.apache.org/jira/browse/CASSANDRA-6716 reports
the same exception but we didn't do any scrubbing, so I'm not sure it
applies either.
 
We're running cassandra 2.1.5 by the way. I don't know if upgrading will
fix the problems, because I didn't really see anything related to this
looking in the changelogs.
 
I'm wondering if getting these exceptions will somehow "block" the
repair, because it seems the repair is super slow right now (we're
talking days repairing).

Re: Runtime exception during repair job task

Posted by Carlos Alonso <in...@mrcalonso.com>.
This is probably due to corrupt data or a cassandra upgrade where you
didn't ran upgradesstables

I'd then suggest scrubbing the column family (or upgrading it).

Hope it helps.

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 31 March 2016 at 12:10, <me...@vrischmann.me> wrote:

> Hi all,
>
> Recently we tried to repair one of our biggest table, and we keep getting
> hit by errors related to hard link. Here's a stacktrace:
>
> ERROR [RepairJobTask:4] 2016-03-31 05:47:27,268 RepairJob.java:145 - Error
> occurred during snapshot phase
> java.lang.RuntimeException: Could not create snapshot at /10.51.0.7
>         at
> org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:48)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_80]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> ERROR [AntiEntropyStage:39] 2016-03-31 05:47:27,268
> CassandraDaemon.java:223 - Exception in thread
> Thread[AntiEntropyStage:39,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Tried to hard link
> to file that does not exist
> /data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-Statistics.db
>         at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:141)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_80]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]
> Caused by: java.lang.RuntimeException: Tried to hard link to file that
> does not exist
> /data/db/ks/table-a24af0002ed511e5b983ade99871dd76/ks-table-ka-50582-Statistics.db
>         at
> org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:90)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1799)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:2237)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2319)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:82)
> ~[apache-cassandra-2.1.5.jar:2.1.5]
>         ... 4 common frames omitted
>
> I tried Googling for that particular error and I did not find a definitive
> answer, instead what seems to be recommended is to simply restart the node.
> However, we're getting this error at least once a day and sometimes on
> multiple nodes (we have 7 nodes currently), so it's getting tedious to
> restart cassandra every time.
>
> I saw the issue https://issues.apache.org/jira/browse/CASSANDRA-6433 and
> it suggests it's due to a drop of a keyspace, but we didn't do any drop. So
> I'm not sure that issue really applies, although the error is related.
>
> This issue https://issues.apache.org/jira/browse/CASSANDRA-6716 reports
> the same exception but we didn't do any scrubbing, so I'm not sure it
> applies either.
>
> We're running cassandra 2.1.5 by the way. I don't know if upgrading will
> fix the problems, because I didn't really see anything related to this
> looking in the changelogs.
>
> I'm wondering if getting these exceptions will somehow "block" the repair,
> because it seems the repair is super slow right now (we're talking days
> repairing).
>