You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Christian Lorenz <Ch...@webtrekk.com> on 2017/12/01 09:04:39 UTC

Re: Node crashes on repair (Cassandra 3.11.1)

Hi Jeff,

the repairs worked fine before on version 3.9. I noticed that the validation tasks when doing a repair are not bound anymore to the concurrent_compactors value.
Is this maybe too much pressure for the node to manage, so it gets stressed too much?

Greetings,
Christian

Von: Jeff Jirsa <jj...@gmail.com>
Antworten an: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Datum: Donnerstag, 30. November 2017 um 19:46
An: cassandra <us...@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

That was worded poorly. The depth has a max depth of 20, the tree is the same size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa <jj...@gmail.com>> wrote:
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be timing out elsewhere (or still running validation or something)
--
Jeff Jirsa


On Nov 30, 2017, at 10:12 AM, Javier Canillas <ja...@gmail.com>> wrote:
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between nodes and that's why it times out. How many nodes do you have and what's the size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it), that's why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz <Ch...@webtrekk.com>>:
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a ‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_151]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian



Re: Node crashes on repair (Cassandra 3.11.1)

Posted by Christian Lorenz <Ch...@webtrekk.com>.
I think we’ve hit the Bug described here:

https://issues.apache.org/jira/browse/CASSANDRA-14096

Regards,
Christian

Von: Christian Lorenz <Ch...@webtrekk.com>
Antworten an: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Datum: Freitag, 1. Dezember 2017 um 10:04
An: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

Hi Jeff,

the repairs worked fine before on version 3.9. I noticed that the validation tasks when doing a repair are not bound anymore to the concurrent_compactors value.
Is this maybe too much pressure for the node to manage, so it gets stressed too much?

Greetings,
Christian

Von: Jeff Jirsa <jj...@gmail.com>
Antworten an: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Datum: Donnerstag, 30. November 2017 um 19:46
An: cassandra <us...@cassandra.apache.org>
Betreff: Re: Node crashes on repair (Cassandra 3.11.1)

That was worded poorly. The depth has a max depth of 20, the tree is the same size for any range > 2**20.


On Thu, Nov 30, 2017 at 10:43 AM, Jeff Jirsa <jj...@gmail.com>> wrote:
Merkle trees have a fixed size/depth (2**20), so it’s not that, but it could be timing out elsewhere (or still running validation or something)
--
Jeff Jirsa


On Nov 30, 2017, at 10:12 AM, Javier Canillas <ja...@gmail.com>> wrote:
Christian,

I'm not an expert, but maybe the merkle tree is too big to transfer between nodes and that's why it times out. How many nodes do you have and what's the size of the keyspace? Have you ever done a successfully repair before?

Cassandra reaper does repair based on tokenrange (or even part of it), that's why it can manage to require a small merkle tree.

Regards,

Javier.

2017-11-30 6:48 GMT-03:00 Christian Lorenz <Ch...@webtrekk.com>>:
Hello,

after updating our cluster to Cassandra 3.11.1 (previously 3.9) running a ‘nodetool repair –full’ leads to the node crashing.
Logfile showed the following Exception:
ERROR [ReadRepairStage:36] 2017-11-30 07:42:06,439 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:36,5,main]
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:199) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:76) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_151]
        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.1.jar:3.11.1]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]

The node datasize is ~270GB.  A repair with Cassandra reaper works fine though.

Any idea why this could be happening?

Regards,
Christian