You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dan Kinder <dk...@turnitin.com> on 2017/09/28 16:20:13 UTC

Hi,

I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
following. The cluster does function, for a while, but then some stages
begin to back up and the node does not recover and does not drain the
tasks, even under no load. This happens both to MutationStage and
GossipStage.

I do see the following exception happen in the logs:


ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
CassandraDaemon.java:228 - Exception in thread
Thread[ReadRepairStage:2328,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
received only 1 responses.

        at
org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_91]

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_91]

        at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]


But it's hard to correlate precisely with things going bad. It is also very
strange to me since I have both read_repair_chance and
dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
confusing why ReadRepairStage would err.

Anyone have thoughts on this? It's pretty muddling, and causes nodes to
lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
If I can't find a resolution I'm going to need to downgrade and restore to
backup...

The only issue I found that looked similar is
https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears to
be fixed by 3.10.


$ nodetool tpstats

Pool Name                         Active   Pending      Completed   Blocked
All time blocked

ReadStage                              0         0         582103         0
              0

MiscStage                              0         0              0         0
              0

CompactionExecutor                    11        11           2868         0
              0

MutationStage                         32   4593678       55057393         0
              0

GossipStage                            1      2818         371487         0
              0

RequestResponseStage                   0         0        4345522         0
              0

ReadRepairStage                        0         0         151473         0
              0

CounterMutationStage                   0         0              0         0
              0

MemtableFlushWriter                    1        81             76         0
              0

MemtablePostFlush                      1       382            139         0
              0

ValidationExecutor                     0         0              0         0
              0

ViewMutationStage                      0         0              0         0
              0

CacheCleanupExecutor                   0         0              0         0
              0

PerDiskMemtableFlushWriter_10          0         0             69         0
              0

PerDiskMemtableFlushWriter_11          0         0             69         0
              0

MemtableReclaimMemory                  0         0             81         0
              0

PendingRangeCalculator                 0         0             32         0
              0

SecondaryIndexManagement               0         0              0         0
              0

HintsDispatcher                        0         0            596         0
              0

PerDiskMemtableFlushWriter_1           0         0             69         0
              0

Native-Transport-Requests             11         0        4547746         0
              67

PerDiskMemtableFlushWriter_2           0         0             69         0
              0

MigrationStage                         1      1545            586         0
              0

PerDiskMemtableFlushWriter_0           0         0             80         0
              0

Sampler                                0         0              0         0
              0

PerDiskMemtableFlushWriter_5           0         0             69         0
              0

InternalResponseStage                  0         0          45432         0
              0

PerDiskMemtableFlushWriter_6           0         0             69         0
              0

PerDiskMemtableFlushWriter_3           0         0             69         0
              0

PerDiskMemtableFlushWriter_4           0         0             69         0
              0

PerDiskMemtableFlushWriter_9           0         0             69         0
              0

AntiEntropyStage                       0         0              0         0
              0

PerDiskMemtableFlushWriter_7           0         0             69         0
              0

PerDiskMemtableFlushWriter_8           0         0             69         0
              0


Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0


-dan

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
That read timeout looks like a nonissue (though we should catch it and
squash it differently). MigrationStage is backed up as well. Are you still
bouncing nodes? Have you fully upgraded the cluster at this point?


On Thu, Sep 28, 2017 at 9:44 AM, Dan Kinder <dk...@turnitin.com> wrote:

> I should also note, I also see nodes become locked up without seeing that
> Exception. But the GossipStage buildup does seem correlated with gossip
> activity, e.g. me restarting a different node.
>
> On Thu, Sep 28, 2017 at 9:20 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>> -dan
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re:

Posted by Prem Yadav <ip...@gmail.com>.
Dan,
As part of upgrade, did you upgrade the sstables?

Sent from mobile. Please excuse typos

On 28 Sep 2017 17:45, "Dan Kinder" <dk...@turnitin.com> wrote:

> I should also note, I also see nodes become locked up without seeing that
> Exception. But the GossipStage buildup does seem correlated with gossip
> activity, e.g. me restarting a different node.
>
> On Thu, Sep 28, 2017 at 9:20 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>> -dan
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
I should also note, I also see nodes become locked up without seeing that
Exception. But the GossipStage buildup does seem correlated with gossip
activity, e.g. me restarting a different node.

On Thu, Sep 28, 2017 at 9:20 AM, Dan Kinder <dk...@turnitin.com> wrote:

> Hi,
>
> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
> following. The cluster does function, for a while, but then some stages
> begin to back up and the node does not recover and does not drain the
> tasks, even under no load. This happens both to MutationStage and
> GossipStage.
>
> I do see the following exception happen in the logs:
>
>
> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
> CassandraDaemon.java:228 - Exception in thread
> Thread[ReadRepairStage:2328,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 1 responses.
>
>         at org.apache.cassandra.service.DataResolver$
> RepairMergeListener.close(DataResolver.java:171)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.partitions.
> UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_91]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_91]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>
>
> But it's hard to correlate precisely with things going bad. It is also
> very strange to me since I have both read_repair_chance and
> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
> confusing why ReadRepairStage would err.
>
> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
> If I can't find a resolution I'm going to need to downgrade and restore to
> backup...
>
> The only issue I found that looked similar is https://issues.apache.org/
> jira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>
>
> $ nodetool tpstats
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         582103         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    11        11           2868         0
>                 0
>
> MutationStage                         32   4593678       55057393         0
>                 0
>
> GossipStage                            1      2818         371487         0
>                 0
>
> RequestResponseStage                   0         0        4345522         0
>                 0
>
> ReadRepairStage                        0         0         151473         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        81             76         0
>                 0
>
> MemtablePostFlush                      1       382            139         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             69         0
>                 0
>
> MemtableReclaimMemory                  0         0             81         0
>                 0
>
> PendingRangeCalculator                 0         0             32         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        0         0            596         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             69         0
>                 0
>
> Native-Transport-Requests             11         0        4547746
> 0                67
>
> PerDiskMemtableFlushWriter_2           0         0             69         0
>                 0
>
> MigrationStage                         1      1545            586         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             80         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             69         0
>                 0
>
> InternalResponseStage                  0         0          45432         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             69         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             69         0
>                 0
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
> -dan
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Sorry, for that ReadStage exception, I take it back, accidentally ended up
too early in the logs. This node that has building ReadStage shows no
exceptions in the logs.

nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked
 All time blocked
ReadStage                              8      1882          45881         0
                0
MiscStage                              0         0              0         0
                0
CompactionExecutor                     9         9           2551         0
                0
MutationStage                          0         0       35929880         0
                0
GossipStage                            0         0          35793         0
                0
RequestResponseStage                   0         0         751285         0
                0
ReadRepairStage                        0         0            224         0
                0
CounterMutationStage                   0         0              0         0
                0
MemtableFlushWriter                    0         0            111         0
                0
MemtablePostFlush                      0         0            239         0
                0
ValidationExecutor                     0         0              0         0
                0
ViewMutationStage                      0         0              0         0
                0
CacheCleanupExecutor                   0         0              0         0
                0
PerDiskMemtableFlushWriter_10          0         0            104         0
                0
PerDiskMemtableFlushWriter_11          0         0            104         0
                0
MemtableReclaimMemory                  0         0            116         0
                0
PendingRangeCalculator                 0         0             16         0
                0
SecondaryIndexManagement               0         0              0         0
                0
HintsDispatcher                        0         0             13         0
                0
PerDiskMemtableFlushWriter_1           0         0            104         0
                0
Native-Transport-Requests              0         0        2607030         0
                0
PerDiskMemtableFlushWriter_2           0         0            104         0
                0
MigrationStage                         0         0            278         0
                0
PerDiskMemtableFlushWriter_0           0         0            115         0
                0
Sampler                                0         0              0         0
                0
PerDiskMemtableFlushWriter_5           0         0            104         0
                0
InternalResponseStage                  0         0            298         0
                0
PerDiskMemtableFlushWriter_6           0         0            104         0
                0
PerDiskMemtableFlushWriter_3           0         0            104         0
                0
PerDiskMemtableFlushWriter_4           0         0            104         0
                0
PerDiskMemtableFlushWriter_9           0         0            104         0
                0
AntiEntropyStage                       0         0              0         0
                0
PerDiskMemtableFlushWriter_7           0         0            104         0
                0
PerDiskMemtableFlushWriter_8           0         0            104         0
                0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0


On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:

> Thanks for the responses.
>
> @Prem yes this is after the entire cluster is on 3.11, but no I did not
> run upgradesstables yet.
>
> @Thomas no I don't see any major GC going on.
>
> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and
> bring it back (thankfully this cluster is not serving live traffic). The
> nodes seemed okay for an hour or two, but I see the issue again, without me
> bouncing any nodes. This time it's ReadStage that's building up, and the
> exception I'm seeing in the logs is:
>
> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
> - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(6150926370328526396, 696a6374652e6f7267) (
> 2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>
>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.ReadCallback$
> AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_71]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_71]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>
>
> Do you think running upgradesstables would help? Or relocatesstables? I
> presumed it shouldn't be necessary for Cassandra to function, just an
> optimization.
>
> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
>> Dan,
>>
>>
>>
>> do you see any major GC? We have been hit by the following memory leak in
>> our loadtest environment with 3.11.0.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>
>>
>>
>> So, depending on the heap size and uptime, you might get into heap
>> troubles.
>>
>>
>>
>> Thomas
>>
>>
>>
>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>> *Sent:* Donnerstag, 28. September 2017 18:20
>> *To:* user@cassandra.apache.org
>> *Subject:*
>>
>>
>>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> -dan
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Created https://issues.apache.org/jira/browse/CASSANDRA-13923

On Mon, Oct 2, 2017 at 12:06 PM, Dan Kinder <dk...@turnitin.com> wrote:

> Sure will do.
>
> On Mon, Oct 2, 2017 at 11:48 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> You're right, sorry I didnt read the full stack (gmail hid it from me)
>>
>> Would you open a JIRA with your stack traces, and note (somewhat loudly)
>> that this is a regression?
>>
>>
>> On Mon, Oct 2, 2017 at 11:43 AM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>>> Right, I just meant that calling it at all results in holding a read
>>> lock, which unfortunately is blocking these read threads.
>>>
>>> On Mon, Oct 2, 2017 at 11:40 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Oct 2, 2017 at 11:27 AM, Dan Kinder <dk...@turnitin.com>
>>>> wrote:
>>>>
>>>>> (As a side note, it seems silly to call shouldDefragment at all on a
>>>>> read if the compaction strategy is not STSC)
>>>>>
>>>>>
>>>>>
>>>> It defaults to false:
>>>>
>>>> https://github.com/apache/cassandra/blob/cassandra-3.0/src/j
>>>> ava/org/apache/cassandra/db/compaction/AbstractCompactionStr
>>>> ategy.java#L302
>>>>
>>>> And nothing else other than STCS overrides it to true.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dan Kinder
>>> Principal Software Engineer
>>> Turnitin – www.turnitin.com
>>> dkinder@turnitin.com
>>>
>>
>>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Sure will do.

On Mon, Oct 2, 2017 at 11:48 AM, Jeff Jirsa <jj...@gmail.com> wrote:

> You're right, sorry I didnt read the full stack (gmail hid it from me)
>
> Would you open a JIRA with your stack traces, and note (somewhat loudly)
> that this is a regression?
>
>
> On Mon, Oct 2, 2017 at 11:43 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> Right, I just meant that calling it at all results in holding a read
>> lock, which unfortunately is blocking these read threads.
>>
>> On Mon, Oct 2, 2017 at 11:40 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Oct 2, 2017 at 11:27 AM, Dan Kinder <dk...@turnitin.com>
>>> wrote:
>>>
>>>> (As a side note, it seems silly to call shouldDefragment at all on a
>>>> read if the compaction strategy is not STSC)
>>>>
>>>>
>>>>
>>> It defaults to false:
>>>
>>> https://github.com/apache/cassandra/blob/cassandra-3.0/src/j
>>> ava/org/apache/cassandra/db/compaction/AbstractCompactionStr
>>> ategy.java#L302
>>>
>>> And nothing else other than STCS overrides it to true.
>>>
>>>
>>>
>>
>>
>> --
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
You're right, sorry I didnt read the full stack (gmail hid it from me)

Would you open a JIRA with your stack traces, and note (somewhat loudly)
that this is a regression?


On Mon, Oct 2, 2017 at 11:43 AM, Dan Kinder <dk...@turnitin.com> wrote:

> Right, I just meant that calling it at all results in holding a read lock,
> which unfortunately is blocking these read threads.
>
> On Mon, Oct 2, 2017 at 11:40 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>>
>>
>> On Mon, Oct 2, 2017 at 11:27 AM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>>> (As a side note, it seems silly to call shouldDefragment at all on a
>>> read if the compaction strategy is not STSC)
>>>
>>>
>>>
>> It defaults to false:
>>
>> https://github.com/apache/cassandra/blob/cassandra-3.0/src/
>> java/org/apache/cassandra/db/compaction/AbstractCompactionS
>> trategy.java#L302
>>
>> And nothing else other than STCS overrides it to true.
>>
>>
>>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Right, I just meant that calling it at all results in holding a read lock,
which unfortunately is blocking these read threads.

On Mon, Oct 2, 2017 at 11:40 AM, Jeff Jirsa <jj...@gmail.com> wrote:

>
>
> On Mon, Oct 2, 2017 at 11:27 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> (As a side note, it seems silly to call shouldDefragment at all on a read
>> if the compaction strategy is not STSC)
>>
>>
>>
> It defaults to false:
>
> https://github.com/apache/cassandra/blob/cassandra-3.0/
> src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.
> java#L302
>
> And nothing else other than STCS overrides it to true.
>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
On Mon, Oct 2, 2017 at 11:27 AM, Dan Kinder <dk...@turnitin.com> wrote:

> (As a side note, it seems silly to call shouldDefragment at all on a read
> if the compaction strategy is not STSC)
>
>
>
It defaults to false:

https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java#L302

And nothing else other than STCS overrides it to true.

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
(As a side note, it seems silly to call shouldDefragment at all on a read
if the compaction strategy is not STSC)

On Mon, Oct 2, 2017 at 11:15 AM, Dan Kinder <dk...@turnitin.com> wrote:

> Sadly the cleanup setting didn't do it, I'm still investigating this and
> open to input.
>
> Looking at the stack when reads are backed up, I do see something
> interesting. It looks like the compaction manager is blocked looking for
> candidates, and the read threads are blocked because they are trying to
> check shouldDefragment. Thread stacks pasted below.
>
> I guess I shouldn't be surprised getCompactionCandidates would be slow since
> these are large nodes and each one has roughly 15,000 SSTables. This is not
> new, we were operating fine in 2.2.6 with this many SSTables on a single
> node. (Note, this is all LCS, with SSTable size 256MB).
>
> Looking back at the first stack trace I uploaded, which had mutations
> building up, I do see many threads blocked on:
> - org.apache.cassandra.db.compaction.LeveledManifest.getLevel(int)
> @bci=5, line=769 (Compiled frame)
> - org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates()
> @bci=275, line=383 (Compiled frame)
>
> I may try increasing SSTable size and running a major compaction to see if
> that helps.
>
> Thread 7286: (state = IN_JAVA)
>
>  - java.util.AbstractCollection.addAll(java.util.Collection) @bci=19,
> line=343 (Compiled frame; information may be imprecise)
>
>  - java.util.HashSet.<init>(java.util.Collection) @bci=35, line=119
> (Compiled frame)
>
>  - com.google.common.collect.Sets.newHashSet(java.lang.Iterable) @bci=15,
> line=218 (Compiled frame)
>
>  - org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates()
> @bci=84, line=353 (Compiled frame)
>
>  - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getN
> extBackgroundTask(int) @bci=4, line=119 (Compiled frame)
>
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.getN
> extBackgroundTask(int) @bci=105, line=119 (Compiled frame)
>
>  - org.apache.cassandra.db.compaction.CompactionManager$
> BackgroundCompactionCandidate.run() @bci=84, line=262 (Compiled frame)
>
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled
> frame)
>
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>
>  - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled
> frame)
>
>  - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1149 (Compiled frame)
>
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624
> (Compiled frame)
>
>  - org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(java.lang.Runnable) @bci=1, line=81
> (Compiled frame)
>
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4.run()
> @bci=4 (Compiled frame)
>
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
>
>
> (I have 8 readers and they all look like this:)
>
> Thread 25981: (state = BLOCKED)
>
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame;
> information may be imprecise)
>
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
> line=175 (Compiled frame)
>
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> @bci=1, line=836 (Compiled frame)
>
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int)
> @bci=83, line=967 (Interpreted frame)
>
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int)
> @bci=10, line=1283 (Compiled frame)
>
>  - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock()
> @bci=5, line=727 (Compiled frame)
>
>  - org.apache.cassandra.db.compaction.CompactionStrategyManager.shouldDefragment()
> @bci=4, line=406 (Interpreted frame)
>
>  - org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndSSTablesInTimestampOrder(org.apache.cassandra.db.ColumnFamilyStore,
> org.apache.cassandra.db.filter.ClusteringIndexNamesFilter) @bci=981,
> line=831 (Compiled frame)
>
>  - org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndDiskInternal(org.apache.cassandra.db.ColumnFamilyStore)
> @bci=26, line=537 (Compiled frame)
>
>  - org.apache.cassandra.db.SinglePartitionReadCommand.
> queryMemtableAndDisk(org.apache.cassandra.db.ColumnFamilyStore,
> org.apache.cassandra.db.ReadExecutionController) @bci=37, line=514
> (Compiled frame)
>
>  - org.apache.cassandra.db.SinglePartitionReadCommand.
> queryStorage(org.apache.cassandra.db.ColumnFamilyStore,
> org.apache.cassandra.db.ReadExecutionController) @bci=19, line=376
> (Compiled frame)
>
>  - org.apache.cassandra.db.ReadCommand.executeLocally(
> org.apache.cassandra.db.ReadExecutionController) @bci=115, line=407
> (Compiled frame)
>
>  - org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(
> org.apache.cassandra.net.MessageIn, int) @bci=59, line=48 (Compiled frame)
>
>  - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=86, line=66
> (Compiled frame)
>
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
> (Compiled frame)
>
>  - org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
> @bci=5, line=162 (Compiled frame)
>
>  - org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$LocalSessionFutureTask.run() @bci=12, line=134 (Compiled frame)
>
>  - org.apache.cassandra.concurrent.SEPWorker.run() @bci=128, line=109
> (Compiled frame)
>
>  - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)
>
>
>
>
>
> On Fri, Sep 29, 2017 at 11:57 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> There are 9 non-system tables.
>>
>> This machine has 64GB of memory and the memtable heap is deliberately
>> high to reduce compaction overhead.
>>
>> So far I'm hopeful, I am seeing much higher resource from the nodes now
>> (closer to expected).
>>
>> On Fri, Sep 29, 2017 at 11:34 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>>> Aside from that a 16gb memtable is huge and probably creating a ton of
>>> objects in heap. How many tables are in your cluster?
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Sep 29, 2017, at 11:18 AM, Dan Kinder <dk...@turnitin.com> wrote:
>>>
>>> I have a theory I'm testing out, after looking at the thread stack. Here
>>> it is:
>>>
>>> Carried over from before the upgrade, I have
>>> memtable_heap_space_in_mb: 16384 # 16GB
>>> memtable_cleanup_threshold: 0.11
>>> memtable_allocation_type: heap_buffers
>>>
>>> But since I'm on a jbod with 12 disks, I'm letting
>>> memtable_flush_writers default to 2 per disk. So this would calculate a
>>> cleanup threshold of 1/(24+1) = 0.04 by default. Which is very different
>>> from 0.11.
>>>
>>> Perhaps something about this configuration makes it lock up trying to
>>> allocate memtables?
>>>
>>> On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
>>> jthakrar@conversantmedia.com> wrote:
>>>
>>>> This looks like a potential disk / IO issue.
>>>>
>>>>
>>>>
>>>> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Dan Kinder <dk...@turnitin.com>
>>>> *Date: *Friday, September 29, 2017 at 11:47 AM
>>>> *To: *<us...@cassandra.apache.org>
>>>> *Subject: *Re:
>>>>
>>>>
>>>>
>>>> Attached a jstack thread dump from a node that's having trouble (see
>>>> tpstats paste as well)
>>>>
>>>>
>>>>
>>>> Pool Name                         Active   Pending      Completed
>>>> Blocked  All time blocked
>>>>
>>>> ReadStage                              0         0         409111
>>>>     0                 0
>>>>
>>>> MiscStage                              0         0              0
>>>>     0                 0
>>>>
>>>> CompactionExecutor                    10        10           2481
>>>>     0                 0
>>>>
>>>> MutationStage                         32   1572402       31515477
>>>>     0                 0
>>>>
>>>> GossipStage                            1      1620         180091
>>>>     0                 0
>>>>
>>>> RequestResponseStage                   0         0       17509006
>>>>     0                 0
>>>>
>>>> ReadRepairStage                        0         0           4833
>>>>     0                 0
>>>>
>>>> CounterMutationStage                   0         0              0
>>>>     0                 0
>>>>
>>>> MemtableFlushWriter                    1        44             74
>>>>     0                 0
>>>>
>>>> MemtablePostFlush                      1       281            158
>>>>     0                 0
>>>>
>>>> ValidationExecutor                     0         0              0
>>>>     0                 0
>>>>
>>>> ViewMutationStage                      0         0              0
>>>>     0                 0
>>>>
>>>> CacheCleanupExecutor                   0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_10          0         0             67
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_11          0         0             67
>>>>     0                 0
>>>>
>>>> MemtableReclaimMemory                  0         0             79
>>>>     0                 0
>>>>
>>>> PendingRangeCalculator                 0         0             19
>>>>     0                 0
>>>>
>>>> SecondaryIndexManagement               0         0              0
>>>>     0                 0
>>>>
>>>> HintsDispatcher                        1         1           1456
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_1           0         0             67
>>>>     0                 0
>>>>
>>>> Native-Transport-Requests             69         0       17951984
>>>>     0             26956
>>>>
>>>> PerDiskMemtableFlushWriter_2           0         0             67
>>>>     0                 0
>>>>
>>>> MigrationStage                         1       201           1048
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_0           0         0             78
>>>>     0                 0
>>>>
>>>> Sampler                                0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_5           0         0             67
>>>>     0                 0
>>>>
>>>> InternalResponseStage                  0         0        2754163
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_6           0         0             67
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_3           0         0             67
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_4           0         0             67
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_9           0         0             67
>>>>     0                 0
>>>>
>>>> AntiEntropyStage                       0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_7           0         0             67
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_8           0         0             67
>>>>     0                 0
>>>>
>>>>
>>>>
>>>> Message type           Dropped
>>>>
>>>> READ                         0
>>>>
>>>> RANGE_SLICE                  0
>>>>
>>>> _TRACE                       0
>>>>
>>>> HINT                         0
>>>>
>>>> MUTATION                     0
>>>>
>>>> COUNTER_MUTATION             0
>>>>
>>>> BATCH_STORE                  0
>>>>
>>>> BATCH_REMOVE                 0
>>>>
>>>> REQUEST_RESPONSE             0
>>>>
>>>> PAGED_RANGE                  0
>>>>
>>>> READ_REPAIR                  0
>>>>
>>>>
>>>>
>>>> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com>
>>>> wrote:
>>>>
>>>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
>>>> very long... Attached the full log as well since node restart.
>>>>
>>>>
>>>>
>>>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n
>>>> 40
>>>>
>>>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>>
>>>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>>>>
>>>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation
>>>> Pause) (young)
>>>>
>>>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>>
>>>> The digest mismatch exception is not a problem, that's why it's only
>>>> logged at debug.
>>>>
>>>>
>>>>
>>>> As Thomas noted, there's a pretty good chance this is
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a
>>>> lot of GCInspector logs indicating GC pauses, that would add confidence to
>>>> that diagnosis.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com>
>>>> wrote:
>>>>
>>>> Thanks for the responses.
>>>>
>>>>
>>>>
>>>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>>>> run upgradesstables yet.
>>>>
>>>>
>>>>
>>>> @Thomas no I don't see any major GC going on.
>>>>
>>>>
>>>>
>>>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>>>> and bring it back (thankfully this cluster is not serving live traffic).
>>>> The nodes seemed okay for an hour or two, but I see the issue again,
>>>> without me bouncing any nodes. This time it's ReadStage that's building up,
>>>> and the exception I'm seeing in the logs is:
>>>>
>>>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206
>>>> ReadCallback.java:242 - Digest mismatch:
>>>>
>>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>>>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>>>
>>>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> [na:1.8.0_71]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> [na:1.8.0_71]
>>>>
>>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>>> [apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>>>
>>>>
>>>>
>>>> Do you think running upgradesstables would help? Or relocatesstables? I
>>>> presumed it shouldn't be necessary for Cassandra to function, just an
>>>> optimization.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>>>> thomas.steinmaurer@dynatrace.com> wrote:
>>>>
>>>> Dan,
>>>>
>>>>
>>>>
>>>> do you see any major GC? We have been hit by the following memory leak
>>>> in our loadtest environment with 3.11.0.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>>>
>>>>
>>>>
>>>> So, depending on the heap size and uptime, you might get into heap
>>>> troubles.
>>>>
>>>>
>>>>
>>>> Thomas
>>>>
>>>>
>>>>
>>>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>>>> *Sent:* Donnerstag, 28. September 2017 18:20
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:*
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>>>> following. The cluster does function, for a while, but then some stages
>>>> begin to back up and the node does not recover and does not drain the
>>>> tasks, even under no load. This happens both to MutationStage and
>>>> GossipStage.
>>>>
>>>> I do see the following exception happen in the logs:
>>>>
>>>>
>>>>
>>>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>>>> CassandraDaemon.java:228 - Exception in thread
>>>> Thread[ReadRepairStage:2328,5,main]
>>>>
>>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>>> out - received only 1 responses.
>>>>
>>>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> ~[na:1.8.0_91]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> ~[na:1.8.0_91]
>>>>
>>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>>>
>>>>
>>>>
>>>> But it's hard to correlate precisely with things going bad. It is also
>>>> very strange to me since I have both read_repair_chance and
>>>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>>>> confusing why ReadRepairStage would err.
>>>>
>>>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>>>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>>>> If I can't find a resolution I'm going to need to downgrade and restore to
>>>> backup...
>>>>
>>>> The only issue I found that looked similar is
>>>> https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears
>>>> to be fixed by 3.10.
>>>>
>>>>
>>>>
>>>> $ nodetool tpstats
>>>>
>>>> Pool Name                         Active   Pending      Completed
>>>> Blocked  All time blocked
>>>>
>>>> ReadStage                              0         0         582103
>>>>     0                 0
>>>>
>>>> MiscStage                              0         0              0
>>>>     0                 0
>>>>
>>>> CompactionExecutor                    11        11           2868
>>>>     0                 0
>>>>
>>>> MutationStage                         32   4593678       55057393
>>>>     0                 0
>>>>
>>>> GossipStage                            1      2818         371487
>>>>     0                 0
>>>>
>>>> RequestResponseStage                   0         0        4345522
>>>>     0                 0
>>>>
>>>> ReadRepairStage                        0         0         151473
>>>>     0                 0
>>>>
>>>> CounterMutationStage                   0         0              0
>>>>     0                 0
>>>>
>>>> MemtableFlushWriter                    1        81             76
>>>>     0                 0
>>>>
>>>> MemtablePostFlush                      1       382            139
>>>>     0                 0
>>>>
>>>> ValidationExecutor                     0         0              0
>>>>     0                 0
>>>>
>>>> ViewMutationStage                      0         0              0
>>>>     0                 0
>>>>
>>>> CacheCleanupExecutor                   0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_10          0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_11          0         0             69
>>>>     0                 0
>>>>
>>>> MemtableReclaimMemory                  0         0             81
>>>>     0                 0
>>>>
>>>> PendingRangeCalculator                 0         0             32
>>>>     0                 0
>>>>
>>>> SecondaryIndexManagement               0         0              0
>>>>     0                 0
>>>>
>>>> HintsDispatcher                        0         0            596
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_1           0         0             69
>>>>     0                 0
>>>>
>>>> Native-Transport-Requests             11         0        4547746
>>>>     0                67
>>>>
>>>> PerDiskMemtableFlushWriter_2           0         0             69
>>>>     0                 0
>>>>
>>>> MigrationStage                         1      1545            586
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_0           0         0             80
>>>>     0                 0
>>>>
>>>> Sampler                                0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_5           0         0             69
>>>>     0                 0
>>>>
>>>> InternalResponseStage                  0         0          45432
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_6           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_3           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_4           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_9           0         0             69
>>>>     0                 0
>>>>
>>>> AntiEntropyStage                       0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_7           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_8           0         0             69
>>>>     0                 0
>>>>
>>>>
>>>>
>>>> Message type           Dropped
>>>>
>>>> READ                         0
>>>>
>>>> RANGE_SLICE                  0
>>>>
>>>> _TRACE                       0
>>>>
>>>> HINT                         0
>>>>
>>>> MUTATION                     0
>>>>
>>>> COUNTER_MUTATION             0
>>>>
>>>> BATCH_STORE                  0
>>>>
>>>> BATCH_REMOVE                 0
>>>>
>>>> REQUEST_RESPONSE             0
>>>>
>>>> PAGED_RANGE                  0
>>>>
>>>> READ_REPAIR                  0
>>>>
>>>>
>>>>
>>>> -dan
>>>>
>>>> The contents of this e-mail are intended for the named addressee only.
>>>> It contains information that may be confidential. Unless you are the named
>>>> addressee or an authorized designee, you may not copy or use it, or
>>>> disclose it to anyone else. If you received it in error please notify us
>>>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>>>> number FN 91482h) is a company registered in Linz whose registered office
>>>> is at 4040 Linz, Austria, Freistädterstraße 313
>>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Dan Kinder
>>>> Principal Software Engineer
>>>> Turnitin – www.turnitin.com
>>>> dkinder@turnitin.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Dan Kinder
>>>> Principal Software Engineer
>>>> Turnitin – www.turnitin.com
>>>> dkinder@turnitin.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Dan Kinder
>>>> Principal Software Engineer
>>>> Turnitin – www.turnitin.com
>>>> dkinder@turnitin.com
>>>>
>>>
>>>
>>
>>
>> --
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Sadly the cleanup setting didn't do it, I'm still investigating this and
open to input.

Looking at the stack when reads are backed up, I do see something
interesting. It looks like the compaction manager is blocked looking for
candidates, and the read threads are blocked because they are trying to
check shouldDefragment. Thread stacks pasted below.

I guess I shouldn't be surprised getCompactionCandidates would be slow since
these are large nodes and each one has roughly 15,000 SSTables. This is not
new, we were operating fine in 2.2.6 with this many SSTables on a single
node. (Note, this is all LCS, with SSTable size 256MB).

Looking back at the first stack trace I uploaded, which had mutations
building up, I do see many threads blocked on:
- org.apache.cassandra.db.compaction.LeveledManifest.getLevel(int) @bci=5,
line=769 (Compiled frame)
- org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates()
@bci=275, line=383 (Compiled frame)

I may try increasing SSTable size and running a major compaction to see if
that helps.

Thread 7286: (state = IN_JAVA)

 - java.util.AbstractCollection.addAll(java.util.Collection) @bci=19,
line=343 (Compiled frame; information may be imprecise)

 - java.util.HashSet.<init>(java.util.Collection) @bci=35, line=119
(Compiled frame)

 - com.google.common.collect.Sets.newHashSet(java.lang.Iterable) @bci=15,
line=218 (Compiled frame)

 -
org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates()
@bci=84, line=353 (Compiled frame)

 - org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int)
@bci=4, line=119 (Compiled frame)

 - org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(int)
@bci=105, line=119 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run()
@bci=84, line=262 (Compiled frame)

 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
(Compiled frame)

 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
(Compiled frame)

 - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

 -
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1149 (Compiled frame)

 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624
(Compiled frame)

 -
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
@bci=1, line=81 (Compiled frame)

 - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4.run()
@bci=4 (Compiled frame)

 - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)


(I have 8 readers and they all look like this:)

Thread 25981: (state = BLOCKED)

 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information
may be imprecise)

 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
line=175 (Compiled frame)

 -
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
@bci=1, line=836 (Compiled frame)

 -
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int)
@bci=83, line=967 (Interpreted frame)

 - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int)
@bci=10, line=1283 (Compiled frame)

 - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock()
@bci=5, line=727 (Compiled frame)

 - org.apache.cassandra.db.compaction.CompactionStrategyManager.shouldDefragment()
@bci=4, line=406 (Interpreted frame)

 -
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(org.apache.cassandra.db.ColumnFamilyStore,
org.apache.cassandra.db.filter.ClusteringIndexNamesFilter) @bci=981,
line=831 (Compiled frame)

 -
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(org.apache.cassandra.db.ColumnFamilyStore)
@bci=26, line=537 (Compiled frame)

 -
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(org.apache.cassandra.db.ColumnFamilyStore,
org.apache.cassandra.db.ReadExecutionController) @bci=37, line=514
(Compiled frame)

 -
org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(org.apache.cassandra.db.ColumnFamilyStore,
org.apache.cassandra.db.ReadExecutionController) @bci=19, line=376
(Compiled frame)

 -
org.apache.cassandra.db.ReadCommand.executeLocally(org.apache.cassandra.db.ReadExecutionController)
@bci=115, line=407 (Compiled frame)

 -
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(org.apache.cassandra.net.MessageIn,
int) @bci=59, line=48 (Compiled frame)

 - org.apache.cassandra.net.MessageDeliveryTask.run() @bci=86, line=66
(Compiled frame)

 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
(Compiled frame)

 -
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run()
@bci=5, line=162 (Compiled frame)

 -
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run()
@bci=12, line=134 (Compiled frame)

 - org.apache.cassandra.concurrent.SEPWorker.run() @bci=128, line=109
(Compiled frame)

 - java.lang.Thread.run() @bci=11, line=748 (Compiled frame)





On Fri, Sep 29, 2017 at 11:57 AM, Dan Kinder <dk...@turnitin.com> wrote:

> There are 9 non-system tables.
>
> This machine has 64GB of memory and the memtable heap is deliberately high
> to reduce compaction overhead.
>
> So far I'm hopeful, I am seeing much higher resource from the nodes now
> (closer to expected).
>
> On Fri, Sep 29, 2017 at 11:34 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> Aside from that a 16gb memtable is huge and probably creating a ton of
>> objects in heap. How many tables are in your cluster?
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Sep 29, 2017, at 11:18 AM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>> I have a theory I'm testing out, after looking at the thread stack. Here
>> it is:
>>
>> Carried over from before the upgrade, I have
>> memtable_heap_space_in_mb: 16384 # 16GB
>> memtable_cleanup_threshold: 0.11
>> memtable_allocation_type: heap_buffers
>>
>> But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers
>> default to 2 per disk. So this would calculate a cleanup threshold of
>> 1/(24+1) = 0.04 by default. Which is very different from 0.11.
>>
>> Perhaps something about this configuration makes it lock up trying to
>> allocate memtables?
>>
>> On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
>> jthakrar@conversantmedia.com> wrote:
>>
>>> This looks like a potential disk / IO issue.
>>>
>>>
>>>
>>> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>>>
>>>
>>>
>>>
>>>
>>> *From: *Dan Kinder <dk...@turnitin.com>
>>> *Date: *Friday, September 29, 2017 at 11:47 AM
>>> *To: *<us...@cassandra.apache.org>
>>> *Subject: *Re:
>>>
>>>
>>>
>>> Attached a jstack thread dump from a node that's having trouble (see
>>> tpstats paste as well)
>>>
>>>
>>>
>>> Pool Name                         Active   Pending      Completed
>>> Blocked  All time blocked
>>>
>>> ReadStage                              0         0         409111
>>>   0                 0
>>>
>>> MiscStage                              0         0              0
>>>   0                 0
>>>
>>> CompactionExecutor                    10        10           2481
>>>   0                 0
>>>
>>> MutationStage                         32   1572402       31515477
>>>   0                 0
>>>
>>> GossipStage                            1      1620         180091
>>>   0                 0
>>>
>>> RequestResponseStage                   0         0       17509006
>>>   0                 0
>>>
>>> ReadRepairStage                        0         0           4833
>>>   0                 0
>>>
>>> CounterMutationStage                   0         0              0
>>>   0                 0
>>>
>>> MemtableFlushWriter                    1        44             74
>>>   0                 0
>>>
>>> MemtablePostFlush                      1       281            158
>>>   0                 0
>>>
>>> ValidationExecutor                     0         0              0
>>>   0                 0
>>>
>>> ViewMutationStage                      0         0              0
>>>   0                 0
>>>
>>> CacheCleanupExecutor                   0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_10          0         0             67
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_11          0         0             67
>>>   0                 0
>>>
>>> MemtableReclaimMemory                  0         0             79
>>>   0                 0
>>>
>>> PendingRangeCalculator                 0         0             19
>>>   0                 0
>>>
>>> SecondaryIndexManagement               0         0              0
>>>   0                 0
>>>
>>> HintsDispatcher                        1         1           1456
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_1           0         0             67
>>>   0                 0
>>>
>>> Native-Transport-Requests             69         0       17951984
>>>   0             26956
>>>
>>> PerDiskMemtableFlushWriter_2           0         0             67
>>>   0                 0
>>>
>>> MigrationStage                         1       201           1048
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_0           0         0             78
>>>   0                 0
>>>
>>> Sampler                                0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_5           0         0             67
>>>   0                 0
>>>
>>> InternalResponseStage                  0         0        2754163
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_6           0         0             67
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_3           0         0             67
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_4           0         0             67
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_9           0         0             67
>>>   0                 0
>>>
>>> AntiEntropyStage                       0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_7           0         0             67
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_8           0         0             67
>>>   0                 0
>>>
>>>
>>>
>>> Message type           Dropped
>>>
>>> READ                         0
>>>
>>> RANGE_SLICE                  0
>>>
>>> _TRACE                       0
>>>
>>> HINT                         0
>>>
>>> MUTATION                     0
>>>
>>> COUNTER_MUTATION             0
>>>
>>> BATCH_STORE                  0
>>>
>>> BATCH_REMOVE                 0
>>>
>>> REQUEST_RESPONSE             0
>>>
>>> PAGED_RANGE                  0
>>>
>>> READ_REPAIR                  0
>>>
>>>
>>>
>>> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com>
>>> wrote:
>>>
>>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
>>> very long... Attached the full log as well since node restart.
>>>
>>>
>>>
>>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n
>>> 40
>>>
>>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>>>
>>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>>
>>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>>>
>>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>>
>>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>>
>>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>>>
>>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
>>> (young)
>>>
>>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>
>>> The digest mismatch exception is not a problem, that's why it's only
>>> logged at debug.
>>>
>>>
>>>
>>> As Thomas noted, there's a pretty good chance this is
>>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a
>>> lot of GCInspector logs indicating GC pauses, that would add confidence to
>>> that diagnosis.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com>
>>> wrote:
>>>
>>> Thanks for the responses.
>>>
>>>
>>>
>>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>>> run upgradesstables yet.
>>>
>>>
>>>
>>> @Thomas no I don't see any major GC going on.
>>>
>>>
>>>
>>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>>> and bring it back (thankfully this cluster is not serving live traffic).
>>> The nodes seemed okay for an hour or two, but I see the issue again,
>>> without me bouncing any nodes. This time it's ReadStage that's building up,
>>> and the exception I'm seeing in the logs is:
>>>
>>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206
>>> ReadCallback.java:242 - Digest mismatch:
>>>
>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>>
>>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_71]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_71]
>>>
>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>> [apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>>
>>>
>>>
>>> Do you think running upgradesstables would help? Or relocatesstables? I
>>> presumed it shouldn't be necessary for Cassandra to function, just an
>>> optimization.
>>>
>>>
>>>
>>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>>> thomas.steinmaurer@dynatrace.com> wrote:
>>>
>>> Dan,
>>>
>>>
>>>
>>> do you see any major GC? We have been hit by the following memory leak
>>> in our loadtest environment with 3.11.0.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>>
>>>
>>>
>>> So, depending on the heap size and uptime, you might get into heap
>>> troubles.
>>>
>>>
>>>
>>> Thomas
>>>
>>>
>>>
>>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>>> *Sent:* Donnerstag, 28. September 2017 18:20
>>> *To:* user@cassandra.apache.org
>>> *Subject:*
>>>
>>>
>>>
>>> Hi,
>>>
>>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>>> following. The cluster does function, for a while, but then some stages
>>> begin to back up and the node does not recover and does not drain the
>>> tasks, even under no load. This happens both to MutationStage and
>>> GossipStage.
>>>
>>> I do see the following exception happen in the logs:
>>>
>>>
>>>
>>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>>> CassandraDaemon.java:228 - Exception in thread
>>> Thread[ReadRepairStage:2328,5,main]
>>>
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>> out - received only 1 responses.
>>>
>>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_91]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> ~[na:1.8.0_91]
>>>
>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>>
>>>
>>>
>>> But it's hard to correlate precisely with things going bad. It is also
>>> very strange to me since I have both read_repair_chance and
>>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>>> confusing why ReadRepairStage would err.
>>>
>>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>>> If I can't find a resolution I'm going to need to downgrade and restore to
>>> backup...
>>>
>>> The only issue I found that looked similar is
>>> https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears
>>> to be fixed by 3.10.
>>>
>>>
>>>
>>> $ nodetool tpstats
>>>
>>> Pool Name                         Active   Pending      Completed
>>> Blocked  All time blocked
>>>
>>> ReadStage                              0         0         582103
>>>   0                 0
>>>
>>> MiscStage                              0         0              0
>>>   0                 0
>>>
>>> CompactionExecutor                    11        11           2868
>>>   0                 0
>>>
>>> MutationStage                         32   4593678       55057393
>>>   0                 0
>>>
>>> GossipStage                            1      2818         371487
>>>   0                 0
>>>
>>> RequestResponseStage                   0         0        4345522
>>>   0                 0
>>>
>>> ReadRepairStage                        0         0         151473
>>>   0                 0
>>>
>>> CounterMutationStage                   0         0              0
>>>   0                 0
>>>
>>> MemtableFlushWriter                    1        81             76
>>>   0                 0
>>>
>>> MemtablePostFlush                      1       382            139
>>>   0                 0
>>>
>>> ValidationExecutor                     0         0              0
>>>   0                 0
>>>
>>> ViewMutationStage                      0         0              0
>>>   0                 0
>>>
>>> CacheCleanupExecutor                   0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_10          0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_11          0         0             69
>>>   0                 0
>>>
>>> MemtableReclaimMemory                  0         0             81
>>>   0                 0
>>>
>>> PendingRangeCalculator                 0         0             32
>>>   0                 0
>>>
>>> SecondaryIndexManagement               0         0              0
>>>   0                 0
>>>
>>> HintsDispatcher                        0         0            596
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_1           0         0             69
>>>   0                 0
>>>
>>> Native-Transport-Requests             11         0        4547746
>>>   0                67
>>>
>>> PerDiskMemtableFlushWriter_2           0         0             69
>>>   0                 0
>>>
>>> MigrationStage                         1      1545            586
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_0           0         0             80
>>>   0                 0
>>>
>>> Sampler                                0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_5           0         0             69
>>>   0                 0
>>>
>>> InternalResponseStage                  0         0          45432
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_6           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_3           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_4           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_9           0         0             69
>>>   0                 0
>>>
>>> AntiEntropyStage                       0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_7           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_8           0         0             69
>>>   0                 0
>>>
>>>
>>>
>>> Message type           Dropped
>>>
>>> READ                         0
>>>
>>> RANGE_SLICE                  0
>>>
>>> _TRACE                       0
>>>
>>> HINT                         0
>>>
>>> MUTATION                     0
>>>
>>> COUNTER_MUTATION             0
>>>
>>> BATCH_STORE                  0
>>>
>>> BATCH_REMOVE                 0
>>>
>>> REQUEST_RESPONSE             0
>>>
>>> PAGED_RANGE                  0
>>>
>>> READ_REPAIR                  0
>>>
>>>
>>>
>>> -dan
>>>
>>> The contents of this e-mail are intended for the named addressee only.
>>> It contains information that may be confidential. Unless you are the named
>>> addressee or an authorized designee, you may not copy or use it, or
>>> disclose it to anyone else. If you received it in error please notify us
>>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>>> number FN 91482h) is a company registered in Linz whose registered office
>>> is at 4040 Linz, Austria, Freistädterstraße 313
>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Dan Kinder
>>> Principal Software Engineer
>>> Turnitin – www.turnitin.com
>>> dkinder@turnitin.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Dan Kinder
>>> Principal Software Engineer
>>> Turnitin – www.turnitin.com
>>> dkinder@turnitin.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Dan Kinder
>>> Principal Software Engineer
>>> Turnitin – www.turnitin.com
>>> dkinder@turnitin.com
>>>
>>
>>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
There are 9 non-system tables.

This machine has 64GB of memory and the memtable heap is deliberately high
to reduce compaction overhead.

So far I'm hopeful, I am seeing much higher resource from the nodes now
(closer to expected).

On Fri, Sep 29, 2017 at 11:34 AM, Jeff Jirsa <jj...@gmail.com> wrote:

> Aside from that a 16gb memtable is huge and probably creating a ton of
> objects in heap. How many tables are in your cluster?
>
>
> --
> Jeff Jirsa
>
>
> On Sep 29, 2017, at 11:18 AM, Dan Kinder <dk...@turnitin.com> wrote:
>
> I have a theory I'm testing out, after looking at the thread stack. Here
> it is:
>
> Carried over from before the upgrade, I have
> memtable_heap_space_in_mb: 16384 # 16GB
> memtable_cleanup_threshold: 0.11
> memtable_allocation_type: heap_buffers
>
> But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers
> default to 2 per disk. So this would calculate a cleanup threshold of
> 1/(24+1) = 0.04 by default. Which is very different from 0.11.
>
> Perhaps something about this configuration makes it lock up trying to
> allocate memtables?
>
> On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
> jthakrar@conversantmedia.com> wrote:
>
>> This looks like a potential disk / IO issue.
>>
>>
>>
>> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>>
>>
>>
>>
>>
>> *From: *Dan Kinder <dk...@turnitin.com>
>> *Date: *Friday, September 29, 2017 at 11:47 AM
>> *To: *<us...@cassandra.apache.org>
>> *Subject: *Re:
>>
>>
>>
>> Attached a jstack thread dump from a node that's having trouble (see
>> tpstats paste as well)
>>
>>
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         409111
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    10        10           2481
>>   0                 0
>>
>> MutationStage                         32   1572402       31515477
>>   0                 0
>>
>> GossipStage                            1      1620         180091
>>   0                 0
>>
>> RequestResponseStage                   0         0       17509006
>>   0                 0
>>
>> ReadRepairStage                        0         0           4833
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        44             74
>>   0                 0
>>
>> MemtablePostFlush                      1       281            158
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             67
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             79
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             19
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        1         1           1456
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             67
>>   0                 0
>>
>> Native-Transport-Requests             69         0       17951984
>>   0             26956
>>
>> PerDiskMemtableFlushWriter_2           0         0             67
>>   0                 0
>>
>> MigrationStage                         1       201           1048
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             78
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             67
>>   0                 0
>>
>> InternalResponseStage                  0         0        2754163
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             67
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             67
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             67
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
>> very long... Attached the full log as well since node restart.
>>
>>
>>
>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>>
>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>>
>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>>
>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>>
>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>>
>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
>> (young)
>>
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>
>>
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>> The digest mismatch exception is not a problem, that's why it's only
>> logged at debug.
>>
>>
>>
>> As Thomas noted, there's a pretty good chance this is
>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
>> of GCInspector logs indicating GC pauses, that would add confidence to that
>> diagnosis.
>>
>>
>>
>>
>>
>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>> Thanks for the responses.
>>
>>
>>
>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>> run upgradesstables yet.
>>
>>
>>
>> @Thomas no I don't see any major GC going on.
>>
>>
>>
>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>> and bring it back (thankfully this cluster is not serving live traffic).
>> The nodes seemed okay for an hour or two, but I see the issue again,
>> without me bouncing any nodes. This time it's ReadStage that's building up,
>> and the exception I'm seeing in the logs is:
>>
>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
>> - Digest mismatch:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>
>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_71]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_71]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>
>>
>>
>> Do you think running upgradesstables would help? Or relocatesstables? I
>> presumed it shouldn't be necessary for Cassandra to function, just an
>> optimization.
>>
>>
>>
>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>> thomas.steinmaurer@dynatrace.com> wrote:
>>
>> Dan,
>>
>>
>>
>> do you see any major GC? We have been hit by the following memory leak in
>> our loadtest environment with 3.11.0.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>
>>
>>
>> So, depending on the heap size and uptime, you might get into heap
>> troubles.
>>
>>
>>
>> Thomas
>>
>>
>>
>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>> *Sent:* Donnerstag, 28. September 2017 18:20
>> *To:* user@cassandra.apache.org
>> *Subject:*
>>
>>
>>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> -dan
>>
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>>
>>
>>
>>
>> --
>>
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
Aside from that a 16gb memtable is huge and probably creating a ton of objects in heap. How many tables are in your cluster? 


-- 
Jeff Jirsa


> On Sep 29, 2017, at 11:18 AM, Dan Kinder <dk...@turnitin.com> wrote:
> 
> I have a theory I'm testing out, after looking at the thread stack. Here it is:
> 
> Carried over from before the upgrade, I have
> memtable_heap_space_in_mb: 16384 # 16GB
> memtable_cleanup_threshold: 0.11
> memtable_allocation_type: heap_buffers
> 
> But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers default to 2 per disk. So this would calculate a cleanup threshold of 1/(24+1) = 0.04 by default. Which is very different from 0.11.
> 
> Perhaps something about this configuration makes it lock up trying to allocate memtables?
> 
>> On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <jt...@conversantmedia.com> wrote:
>> This looks like a potential disk / IO issue.
>> 
>>  
>> 
>> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>> 
>>  
>> 
>>  
>> 
>> From: Dan Kinder <dk...@turnitin.com>
>> Date: Friday, September 29, 2017 at 11:47 AM
>> To: <us...@cassandra.apache.org>
>> Subject: Re:
>> 
>>  
>> 
>> Attached a jstack thread dump from a node that's having trouble (see tpstats paste as well)
>> 
>>  
>> 
>> Pool Name                         Active   Pending      Completed   Blocked  All time blocked
>> 
>> ReadStage                              0         0         409111         0                 0
>> 
>> MiscStage                              0         0              0         0                 0
>> 
>> CompactionExecutor                    10        10           2481         0                 0
>> 
>> MutationStage                         32   1572402       31515477         0                 0
>> 
>> GossipStage                            1      1620         180091         0                 0
>> 
>> RequestResponseStage                   0         0       17509006         0                 0
>> 
>> ReadRepairStage                        0         0           4833         0                 0
>> 
>> CounterMutationStage                   0         0              0         0                 0
>> 
>> MemtableFlushWriter                    1        44             74         0                 0
>> 
>> MemtablePostFlush                      1       281            158         0                 0
>> 
>> ValidationExecutor                     0         0              0         0                 0
>> 
>> ViewMutationStage                      0         0              0         0                 0
>> 
>> CacheCleanupExecutor                   0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_10          0         0             67         0                 0
>> 
>> PerDiskMemtableFlushWriter_11          0         0             67         0                 0
>> 
>> MemtableReclaimMemory                  0         0             79         0                 0
>> 
>> PendingRangeCalculator                 0         0             19         0                 0
>> 
>> SecondaryIndexManagement               0         0              0         0                 0
>> 
>> HintsDispatcher                        1         1           1456         0                 0
>> 
>> PerDiskMemtableFlushWriter_1           0         0             67         0                 0
>> 
>> Native-Transport-Requests             69         0       17951984         0             26956
>> 
>> PerDiskMemtableFlushWriter_2           0         0             67         0                 0
>> 
>> MigrationStage                         1       201           1048         0                 0
>> 
>> PerDiskMemtableFlushWriter_0           0         0             78         0                 0
>> 
>> Sampler                                0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_5           0         0             67         0                 0
>> 
>> InternalResponseStage                  0         0        2754163         0                 0
>> 
>> PerDiskMemtableFlushWriter_6           0         0             67         0                 0
>> 
>> PerDiskMemtableFlushWriter_3           0         0             67         0                 0
>> 
>> PerDiskMemtableFlushWriter_4           0         0             67         0                 0
>> 
>> PerDiskMemtableFlushWriter_9           0         0             67         0                 0
>> 
>> AntiEntropyStage                       0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_7           0         0             67         0                 0
>> 
>> PerDiskMemtableFlushWriter_8           0         0             67         0                 0
>> 
>>  
>> 
>> Message type           Dropped
>> 
>> READ                         0
>> 
>> RANGE_SLICE                  0
>> 
>> _TRACE                       0
>> 
>> HINT                         0
>> 
>> MUTATION                     0
>> 
>> COUNTER_MUTATION             0
>> 
>> BATCH_STORE                  0
>> 
>> BATCH_REMOVE                 0
>> 
>> REQUEST_RESPONSE             0
>> 
>> PAGED_RANGE                  0
>> 
>> READ_REPAIR                  0
>> 
>>  
>> 
>> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com> wrote:
>> 
>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are very long... Attached the full log as well since node restart.
>> 
>>  
>> 
>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>> 
>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.34 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.30 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.31 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.31 sys=0.01, real=0.02 secs] 
>> 
>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.32 sys=0.01, real=0.02 secs] 
>> 
>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.35 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.34 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.34 sys=0.01, real=0.03 secs] 
>> 
>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.31 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.32 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.32 sys=0.01, real=0.02 secs] 
>> 
>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.32 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.33 sys=0.00, real=0.03 secs] 
>> 
>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.34 sys=0.00, real=0.02 secs] 
>> 
>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause) (young)
>> 
>>  [Times: user=0.35 sys=0.00, real=0.02 secs] 
>> 
>>  
>> 
>>  
>> 
>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>> 
>> The digest mismatch exception is not a problem, that's why it's only logged at debug. 
>> 
>>  
>> 
>> As Thomas noted, there's a pretty good chance this is https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot of GCInspector logs indicating GC pauses, that would add confidence to that diagnosis. 
>> 
>>  
>> 
>>  
>> 
>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:
>> 
>> Thanks for the responses.
>> 
>>  
>> 
>> @Prem yes this is after the entire cluster is on 3.11, but no I did not run upgradesstables yet.
>> 
>>  
>> 
>> @Thomas no I don't see any major GC going on.
>> 
>>  
>> 
>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and bring it back (thankfully this cluster is not serving live traffic). The nodes seemed okay for an hour or two, but I see the issue again, without me bouncing any nodes. This time it's ReadStage that's building up, and the exception I'm seeing in the logs is:
>> 
>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242 - Digest mismatch:
>> 
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(6150926370328526396, 696a6374652e6f7267) (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>> 
>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_71]
>> 
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_71]
>> 
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>> 
>>  
>> 
>> Do you think running upgradesstables would help? Or relocatesstables? I presumed it shouldn't be necessary for Cassandra to function, just an optimization.
>> 
>>  
>> 
>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <th...@dynatrace.com> wrote:
>> 
>> Dan,
>> 
>>  
>> 
>> do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>> 
>>  
>> 
>> So, depending on the heap size and uptime, you might get into heap troubles.
>> 
>>  
>> 
>> Thomas
>> 
>>  
>> 
>> From: Dan Kinder [mailto:dkinder@turnitin.com] 
>> Sent: Donnerstag, 28. September 2017 18:20
>> To: user@cassandra.apache.org
>> Subject:
>> 
>>  
>> 
>> Hi,
>> 
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the following. The cluster does function, for a while, but then some stages begin to back up and the node does not recover and does not drain the tasks, even under no load. This happens both to MutationStage and GossipStage.
>> 
>> I do see the following exception happen in the logs:
>> 
>>  
>> 
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:2328,5,main]
>> 
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.
>> 
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]
>> 
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]
>> 
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.0.jar:3.11.0]
>> 
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>> 
>>  
>> 
>> But it's hard to correlate precisely with things going bad. It is also very strange to me since I have both read_repair_chance and dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is confusing why ReadRepairStage would err.
>> 
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to lock up. Once it happens Cassandra can't even shut down, I have to kill -9. If I can't find a resolution I'm going to need to downgrade and restore to backup...
>> 
>> The only issue I found that looked similar is https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>> 
>>  
>> 
>> $ nodetool tpstats
>> 
>> Pool Name                         Active   Pending      Completed   Blocked  All time blocked
>> 
>> ReadStage                              0         0         582103         0                 0
>> 
>> MiscStage                              0         0              0         0                 0
>> 
>> CompactionExecutor                    11        11           2868         0                 0
>> 
>> MutationStage                         32   4593678       55057393         0                 0
>> 
>> GossipStage                            1      2818         371487         0                 0
>> 
>> RequestResponseStage                   0         0        4345522         0                 0
>> 
>> ReadRepairStage                        0         0         151473         0                 0
>> 
>> CounterMutationStage                   0         0              0         0                 0
>> 
>> MemtableFlushWriter                    1        81             76         0                 0
>> 
>> MemtablePostFlush                      1       382            139         0                 0
>> 
>> ValidationExecutor                     0         0              0         0                 0
>> 
>> ViewMutationStage                      0         0              0         0                 0
>> 
>> CacheCleanupExecutor                   0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_10           0         0             69         0                 0
>> 
>> PerDiskMemtableFlushWriter_11           0         0             69         0                 0
>> 
>> MemtableReclaimMemory                  0         0             81         0                 0
>> 
>> PendingRangeCalculator                 0         0             32         0                 0
>> 
>> SecondaryIndexManagement               0         0              0         0                 0
>> 
>> HintsDispatcher                        0         0            596         0                 0
>> 
>> PerDiskMemtableFlushWriter_1           0         0             69         0                 0
>> 
>> Native-Transport-Requests             11         0        4547746         0                67
>> 
>> PerDiskMemtableFlushWriter_2           0         0             69         0                 0
>> 
>> MigrationStage                         1      1545            586         0                 0
>> 
>> PerDiskMemtableFlushWriter_0           0         0             80         0                 0
>> 
>> Sampler                                0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_5           0         0             69         0                 0
>> 
>> InternalResponseStage                  0         0          45432         0                 0
>> 
>> PerDiskMemtableFlushWriter_6           0         0             69         0                 0
>> 
>> PerDiskMemtableFlushWriter_3           0         0             69         0                 0
>> 
>> PerDiskMemtableFlushWriter_4           0         0             69         0                 0
>> 
>> PerDiskMemtableFlushWriter_9           0         0             69         0                 0
>> 
>> AntiEntropyStage                       0         0              0         0                 0
>> 
>> PerDiskMemtableFlushWriter_7           0         0             69         0                 0
>> 
>> PerDiskMemtableFlushWriter_8           0         0             69         0                 0
>> 
>>  
>> 
>> Message type           Dropped
>> 
>> READ                         0
>> 
>> RANGE_SLICE                  0
>> 
>> _TRACE                       0
>> 
>> HINT                         0
>> 
>> MUTATION                     0
>> 
>> COUNTER_MUTATION             0
>> 
>> BATCH_STORE                  0
>> 
>> BATCH_REMOVE                 0
>> 
>> REQUEST_RESPONSE             0
>> 
>> PAGED_RANGE                  0
>> 
>> READ_REPAIR                  0
>> 
>>  
>> 
>> -dan
>> 
>> The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313
>> 
>> 
>> 
>> 
>>  
>> 
>> --
>> 
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>> 
>>  
>> 
>> 
>> 
>> 
>>  
>> 
>> --
>> 
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>> 
>> 
>> 
>> 
>>  
>> 
>> --
>> 
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>> 
> 
> 

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
I have a theory I'm testing out, after looking at the thread stack. Here it
is:

Carried over from before the upgrade, I have
memtable_heap_space_in_mb: 16384 # 16GB
memtable_cleanup_threshold: 0.11
memtable_allocation_type: heap_buffers

But since I'm on a jbod with 12 disks, I'm letting memtable_flush_writers
default to 2 per disk. So this would calculate a cleanup threshold of
1/(24+1) = 0.04 by default. Which is very different from 0.11.

Perhaps something about this configuration makes it lock up trying to
allocate memtables?

On Fri, Sep 29, 2017 at 10:35 AM, Thakrar, Jayesh <
jthakrar@conversantmedia.com> wrote:

> This looks like a potential disk / IO issue.
>
>
>
> See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/
>
>
>
>
>
> *From: *Dan Kinder <dk...@turnitin.com>
> *Date: *Friday, September 29, 2017 at 11:47 AM
> *To: *<us...@cassandra.apache.org>
> *Subject: *Re:
>
>
>
> Attached a jstack thread dump from a node that's having trouble (see
> tpstats paste as well)
>
>
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         409111         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    10        10           2481         0
>                 0
>
> MutationStage                         32   1572402       31515477         0
>                 0
>
> GossipStage                            1      1620         180091         0
>                 0
>
> RequestResponseStage                   0         0       17509006         0
>                 0
>
> ReadRepairStage                        0         0           4833         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        44             74         0
>                 0
>
> MemtablePostFlush                      1       281            158         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             67         0
>                 0
>
> MemtableReclaimMemory                  0         0             79         0
>                 0
>
> PendingRangeCalculator                 0         0             19         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        1         1           1456         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             67         0
>                 0
>
> Native-Transport-Requests             69         0       17951984         0
>             26956
>
> PerDiskMemtableFlushWriter_2           0         0             67         0
>                 0
>
> MigrationStage                         1       201           1048         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             78         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             67         0
>                 0
>
> InternalResponseStage                  0         0        2754163         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             67         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             67         0
>                 0
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com> wrote:
>
> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
> very long... Attached the full log as well since node restart.
>
>
>
> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>
> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>
> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>
> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>
> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>
> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
> (young)
>
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>
>
>
>
>
> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> The digest mismatch exception is not a problem, that's why it's only
> logged at debug.
>
>
>
> As Thomas noted, there's a pretty good chance this is
> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
> of GCInspector logs indicating GC pauses, that would add confidence to that
> diagnosis.
>
>
>
>
>
> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:
>
> Thanks for the responses.
>
>
>
> @Prem yes this is after the entire cluster is on 3.11, but no I did not
> run upgradesstables yet.
>
>
>
> @Thomas no I don't see any major GC going on.
>
>
>
> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and
> bring it back (thankfully this cluster is not serving live traffic). The
> nodes seemed okay for an hour or two, but I see the issue again, without me
> bouncing any nodes. This time it's ReadStage that's building up, and the
> exception I'm seeing in the logs is:
>
> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
> - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>
>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_71]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_71]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
> threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>
>
>
> Do you think running upgradesstables would help? Or relocatesstables? I
> presumed it shouldn't be necessary for Cassandra to function, just an
> optimization.
>
>
>
> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
> Dan,
>
>
>
> do you see any major GC? We have been hit by the following memory leak in
> our loadtest environment with 3.11.0.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13754
>
>
>
> So, depending on the heap size and uptime, you might get into heap
> troubles.
>
>
>
> Thomas
>
>
>
> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
> *Sent:* Donnerstag, 28. September 2017 18:20
> *To:* user@cassandra.apache.org
> *Subject:*
>
>
>
> Hi,
>
> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
> following. The cluster does function, for a while, but then some stages
> begin to back up and the node does not recover and does not drain the
> tasks, even under no load. This happens both to MutationStage and
> GossipStage.
>
> I do see the following exception happen in the logs:
>
>
>
> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
> CassandraDaemon.java:228 - Exception in thread
> Thread[ReadRepairStage:2328,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 1 responses.
>
>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
> ors$2.close(UnfilteredPartitionIterators.java:182)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_91]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_91]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
> threadLocalDeallocator$0(NamedThreadFactory.java:81)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>
>
>
> But it's hard to correlate precisely with things going bad. It is also
> very strange to me since I have both read_repair_chance and
> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
> confusing why ReadRepairStage would err.
>
> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
> If I can't find a resolution I'm going to need to downgrade and restore to
> backup...
>
> The only issue I found that looked similar is https://issues.apache.org/j
> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>
>
>
> $ nodetool tpstats
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         582103         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    11        11           2868         0
>                 0
>
> MutationStage                         32   4593678       55057393         0
>                 0
>
> GossipStage                            1      2818         371487         0
>                 0
>
> RequestResponseStage                   0         0        4345522         0
>                 0
>
> ReadRepairStage                        0         0         151473         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        81             76         0
>                 0
>
> MemtablePostFlush                      1       382            139         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             69         0
>                 0
>
> MemtableReclaimMemory                  0         0             81         0
>                 0
>
> PendingRangeCalculator                 0         0             32         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        0         0            596         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             69         0
>                 0
>
> Native-Transport-Requests             11         0        4547746
> 0                67
>
> PerDiskMemtableFlushWriter_2           0         0             69         0
>                 0
>
> MigrationStage                         1      1545            586         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             80         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             69         0
>                 0
>
> InternalResponseStage                  0         0          45432         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             69         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             69         0
>                 0
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
> -dan
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>
>
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>
>
>
>
>
> --
>
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re:

Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.
This looks like a potential disk / IO issue.

See https://www.pythian.com/blog/guide-to-cassandra-thread-pools/


From: Dan Kinder <dk...@turnitin.com>
Date: Friday, September 29, 2017 at 11:47 AM
To: <us...@cassandra.apache.org>
Subject: Re:

Attached a jstack thread dump from a node that's having trouble (see tpstats paste as well)


Pool Name                         Active   Pending      Completed   Blocked  All time blocked

ReadStage                              0         0         409111         0                 0

MiscStage                              0         0              0         0                 0

CompactionExecutor                    10        10           2481         0                 0

MutationStage                         32   1572402       31515477         0                 0

GossipStage                            1      1620         180091         0                 0

RequestResponseStage                   0         0       17509006         0                 0

ReadRepairStage                        0         0           4833         0                 0

CounterMutationStage                   0         0              0         0                 0

MemtableFlushWriter                    1        44             74         0                 0

MemtablePostFlush                      1       281            158         0                 0

ValidationExecutor                     0         0              0         0                 0

ViewMutationStage                      0         0              0         0                 0

CacheCleanupExecutor                   0         0              0         0                 0

PerDiskMemtableFlushWriter_10          0         0             67         0                 0

PerDiskMemtableFlushWriter_11          0         0             67         0                 0

MemtableReclaimMemory                  0         0             79         0                 0

PendingRangeCalculator                 0         0             19         0                 0

SecondaryIndexManagement               0         0              0         0                 0

HintsDispatcher                        1         1           1456         0                 0

PerDiskMemtableFlushWriter_1           0         0             67         0                 0

Native-Transport-Requests             69         0       17951984         0             26956

PerDiskMemtableFlushWriter_2           0         0             67         0                 0

MigrationStage                         1       201           1048         0                 0

PerDiskMemtableFlushWriter_0           0         0             78         0                 0

Sampler                                0         0              0         0                 0

PerDiskMemtableFlushWriter_5           0         0             67         0                 0

InternalResponseStage                  0         0        2754163         0                 0

PerDiskMemtableFlushWriter_6           0         0             67         0                 0

PerDiskMemtableFlushWriter_3           0         0             67         0                 0

PerDiskMemtableFlushWriter_4           0         0             67         0                 0

PerDiskMemtableFlushWriter_9           0         0             67         0                 0

AntiEntropyStage                       0         0              0         0                 0

PerDiskMemtableFlushWriter_7           0         0             67         0                 0

PerDiskMemtableFlushWriter_8           0         0             67         0                 0



Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0

On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com>> wrote:
Unless I'm reading the logs wrong, it doesn't seem like the pauses are very long... Attached the full log as well since node restart.

$ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.34 sys=0.00, real=0.03 secs]
2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.30 sys=0.00, real=0.03 secs]
2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.31 sys=0.00, real=0.02 secs]
2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.31 sys=0.01, real=0.02 secs]
2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.32 sys=0.01, real=0.02 secs]
2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.35 sys=0.00, real=0.02 secs]
2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.34 sys=0.00, real=0.03 secs]
2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.34 sys=0.01, real=0.03 secs]
2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.31 sys=0.00, real=0.02 secs]
2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.32 sys=0.00, real=0.02 secs]
2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.32 sys=0.01, real=0.02 secs]
2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.32 sys=0.00, real=0.02 secs]
2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.34 sys=0.00, real=0.02 secs]
2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause) (young)
 [Times: user=0.35 sys=0.00, real=0.02 secs]


On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com>> wrote:
The digest mismatch exception is not a problem, that's why it's only logged at debug.

As Thomas noted, there's a pretty good chance this is https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot of GCInspector logs indicating GC pauses, that would add confidence to that diagnosis.


On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com>> wrote:
Thanks for the responses.

@Prem yes this is after the entire cluster is on 3.11, but no I did not run upgradesstables yet.

@Thomas no I don't see any major GC going on.

@Jeff yeah it's fully upgraded. I decided to shut the whole thing down and bring it back (thankfully this cluster is not serving live traffic). The nodes seemed okay for an hour or two, but I see the issue again, without me bouncing any nodes. This time it's ReadStage that's building up, and the exception I'm seeing in the logs is:

DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242 - Digest mismatch:

org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(6150926370328526396, 696a6374652e6f7267) (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)

        at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_71]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_71]

        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.0.jar:3.11.0]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]



Do you think running upgradesstables would help? Or relocatesstables? I presumed it shouldn't be necessary for Cassandra to function, just an optimization.

On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <th...@dynatrace.com>> wrote:
Dan,

do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0.
https://issues.apache.org/jira/browse/CASSANDRA-13754

So, depending on the heap size and uptime, you might get into heap troubles.

Thomas

From: Dan Kinder [mailto:dkinder@turnitin.com<ma...@turnitin.com>]
Sent: Donnerstag, 28. September 2017 18:20
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject:


Hi,

I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the following. The cluster does function, for a while, but then some stages begin to back up and the node does not recover and does not drain the tasks, even under no load. This happens both to MutationStage and GossipStage.

I do see the following exception happen in the logs:



ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:2328,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.

        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]

        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]



But it's hard to correlate precisely with things going bad. It is also very strange to me since I have both read_repair_chance and dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is confusing why ReadRepairStage would err.

Anyone have thoughts on this? It's pretty muddling, and causes nodes to lock up. Once it happens Cassandra can't even shut down, I have to kill -9. If I can't find a resolution I'm going to need to downgrade and restore to backup...

The only issue I found that looked similar is https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.



$ nodetool tpstats

Pool Name                         Active   Pending      Completed   Blocked  All time blocked

ReadStage                              0         0         582103         0                 0

MiscStage                              0         0              0         0                 0

CompactionExecutor                    11        11           2868         0                 0

MutationStage                         32   4593678       55057393         0                 0

GossipStage                            1      2818         371487         0                 0

RequestResponseStage                   0         0        4345522         0                 0

ReadRepairStage                        0         0         151473         0                 0

CounterMutationStage                   0         0              0         0                 0

MemtableFlushWriter                    1        81             76         0                 0

MemtablePostFlush                      1       382            139         0                 0

ValidationExecutor                     0         0              0         0                 0

ViewMutationStage                      0         0              0         0                 0

CacheCleanupExecutor                   0         0              0         0                 0

PerDiskMemtableFlushWriter_10          0         0             69         0                 0

PerDiskMemtableFlushWriter_11          0         0             69         0                 0

MemtableReclaimMemory                  0         0             81         0                 0

PendingRangeCalculator                 0         0             32         0                 0

SecondaryIndexManagement               0         0              0         0                 0

HintsDispatcher                        0         0            596         0                 0

PerDiskMemtableFlushWriter_1           0         0             69         0                 0

Native-Transport-Requests             11         0        4547746         0                67

PerDiskMemtableFlushWriter_2           0         0             69         0                 0

MigrationStage                         1      1545            586         0                 0

PerDiskMemtableFlushWriter_0           0         0             80         0                 0

Sampler                                0         0              0         0                 0

PerDiskMemtableFlushWriter_5           0         0             69         0                 0

InternalResponseStage                  0         0          45432         0                 0

PerDiskMemtableFlushWriter_6           0         0             69         0                 0

PerDiskMemtableFlushWriter_3           0         0             69         0                 0

PerDiskMemtableFlushWriter_4           0         0             69         0                 0

PerDiskMemtableFlushWriter_9           0         0             69         0                 0

AntiEntropyStage                       0         0              0         0                 0

PerDiskMemtableFlushWriter_7           0         0             69         0                 0

PerDiskMemtableFlushWriter_8           0         0             69         0                 0



Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0


-dan
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>



--
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com<http://www.turnitin.com>
dkinder@turnitin.com<ma...@turnitin.com>




--
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com<http://www.turnitin.com>
dkinder@turnitin.com<ma...@turnitin.com>



--
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com<http://www.turnitin.com>
dkinder@turnitin.com<ma...@turnitin.com>

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Other details of note, OS is Centos 6.7. To make it work on this system,
which has an older glibc, had to downgrade the jna jar to 4.2.2 from 4.4.0.

On Fri, Sep 29, 2017 at 9:47 AM, Dan Kinder <dk...@turnitin.com> wrote:

> Attached a jstack thread dump from a node that's having trouble (see
> tpstats paste as well)
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         409111         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    10        10           2481         0
>                 0
>
> MutationStage                         32   1572402       31515477         0
>                 0
>
> GossipStage                            1      1620         180091         0
>                 0
>
> RequestResponseStage                   0         0       17509006         0
>                 0
>
> ReadRepairStage                        0         0           4833         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        44             74         0
>                 0
>
> MemtablePostFlush                      1       281            158         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             67         0
>                 0
>
> MemtableReclaimMemory                  0         0             79         0
>                 0
>
> PendingRangeCalculator                 0         0             19         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        1         1           1456         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             67         0
>                 0
>
> Native-Transport-Requests             69         0       17951984         0
>             26956
>
> PerDiskMemtableFlushWriter_2           0         0             67         0
>                 0
>
> MigrationStage                         1       201           1048         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             78         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             67         0
>                 0
>
> InternalResponseStage                  0         0        2754163         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             67         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             67         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             67         0
>                 0
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
> On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
>> very long... Attached the full log as well since node restart.
>>
>> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
>> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.30 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.31 sys=0.01, real=0.02 secs]
>> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.34 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.34 sys=0.01, real=0.03 secs]
>> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.31 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.32 sys=0.01, real=0.02 secs]
>> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.32 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.33 sys=0.00, real=0.03 secs]
>> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.34 sys=0.00, real=0.02 secs]
>> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
>> (young)
>>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>>
>>
>> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>>> The digest mismatch exception is not a problem, that's why it's only
>>> logged at debug.
>>>
>>> As Thomas noted, there's a pretty good chance this is
>>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a
>>> lot of GCInspector logs indicating GC pauses, that would add confidence to
>>> that diagnosis.  <https://issues.apache.org/jira/browse/CASSANDRA-13754>
>>>
>>>
>>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com>
>>> wrote:
>>>
>>>> Thanks for the responses.
>>>>
>>>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>>>> run upgradesstables yet.
>>>>
>>>> @Thomas no I don't see any major GC going on.
>>>>
>>>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>>>> and bring it back (thankfully this cluster is not serving live traffic).
>>>> The nodes seemed okay for an hour or two, but I see the issue again,
>>>> without me bouncing any nodes. This time it's ReadStage that's building up,
>>>> and the exception I'm seeing in the logs is:
>>>>
>>>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206
>>>> ReadCallback.java:242 - Digest mismatch:
>>>>
>>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>>>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>>>
>>>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> [na:1.8.0_71]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> [na:1.8.0_71]
>>>>
>>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>>> [apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>>>
>>>>
>>>> Do you think running upgradesstables would help? Or relocatesstables? I
>>>> presumed it shouldn't be necessary for Cassandra to function, just an
>>>> optimization.
>>>>
>>>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>>>> thomas.steinmaurer@dynatrace.com> wrote:
>>>>
>>>>> Dan,
>>>>>
>>>>>
>>>>>
>>>>> do you see any major GC? We have been hit by the following memory leak
>>>>> in our loadtest environment with 3.11.0.
>>>>>
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>>>>
>>>>>
>>>>>
>>>>> So, depending on the heap size and uptime, you might get into heap
>>>>> troubles.
>>>>>
>>>>>
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>>>>> *Sent:* Donnerstag, 28. September 2017 18:20
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:*
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>>>>> following. The cluster does function, for a while, but then some stages
>>>>> begin to back up and the node does not recover and does not drain the
>>>>> tasks, even under no load. This happens both to MutationStage and
>>>>> GossipStage.
>>>>>
>>>>> I do see the following exception happen in the logs:
>>>>>
>>>>>
>>>>>
>>>>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>>>>> CassandraDaemon.java:228 - Exception in thread
>>>>> Thread[ReadRepairStage:2328,5,main]
>>>>>
>>>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>>>> out - received only 1 responses.
>>>>>
>>>>>         at org.apache.cassandra.service.D
>>>>> ataResolver$RepairMergeListener.close(DataResolver.java:171)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at org.apache.cassandra.db.partit
>>>>> ions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at org.apache.cassandra.db.transf
>>>>> orm.BaseIterator.close(BaseIterator.java:82)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at org.apache.cassandra.service.D
>>>>> ataResolver.compareResponses(DataResolver.java:89)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at org.apache.cassandra.service.A
>>>>> syncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at org.apache.cassandra.utils.Wra
>>>>> ppedRunnable.run(WrappedRunnable.java:28)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at java.util.concurrent.ThreadPoo
>>>>> lExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]
>>>>>
>>>>>         at java.util.concurrent.ThreadPoo
>>>>> lExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]
>>>>>
>>>>>         at org.apache.cassandra.concurren
>>>>> t.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
>>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>>
>>>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>>>>
>>>>>
>>>>>
>>>>> But it's hard to correlate precisely with things going bad. It is also
>>>>> very strange to me since I have both read_repair_chance and
>>>>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>>>>> confusing why ReadRepairStage would err.
>>>>>
>>>>> Anyone have thoughts on this? It's pretty muddling, and causes nodes
>>>>> to lock up. Once it happens Cassandra can't even shut down, I have to kill
>>>>> -9. If I can't find a resolution I'm going to need to downgrade and restore
>>>>> to backup...
>>>>>
>>>>> The only issue I found that looked similar is
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-12689 but that
>>>>> appears to be fixed by 3.10.
>>>>>
>>>>>
>>>>>
>>>>> $ nodetool tpstats
>>>>>
>>>>> Pool Name                         Active   Pending      Completed
>>>>> Blocked  All time blocked
>>>>>
>>>>> ReadStage                              0         0         582103
>>>>>     0                 0
>>>>>
>>>>> MiscStage                              0         0              0
>>>>>     0                 0
>>>>>
>>>>> CompactionExecutor                    11        11           2868
>>>>>     0                 0
>>>>>
>>>>> MutationStage                         32   4593678       55057393
>>>>>     0                 0
>>>>>
>>>>> GossipStage                            1      2818         371487
>>>>>     0                 0
>>>>>
>>>>> RequestResponseStage                   0         0        4345522
>>>>>     0                 0
>>>>>
>>>>> ReadRepairStage                        0         0         151473
>>>>>     0                 0
>>>>>
>>>>> CounterMutationStage                   0         0              0
>>>>>     0                 0
>>>>>
>>>>> MemtableFlushWriter                    1        81             76
>>>>>     0                 0
>>>>>
>>>>> MemtablePostFlush                      1       382            139
>>>>>     0                 0
>>>>>
>>>>> ValidationExecutor                     0         0              0
>>>>>     0                 0
>>>>>
>>>>> ViewMutationStage                      0         0              0
>>>>>     0                 0
>>>>>
>>>>> CacheCleanupExecutor                   0         0              0
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_10          0         0             69
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_11          0         0             69
>>>>>     0                 0
>>>>>
>>>>> MemtableReclaimMemory                  0         0             81
>>>>>     0                 0
>>>>>
>>>>> PendingRangeCalculator                 0         0             32
>>>>>     0                 0
>>>>>
>>>>> SecondaryIndexManagement               0         0              0
>>>>>     0                 0
>>>>>
>>>>> HintsDispatcher                        0         0            596
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_1           0         0             69
>>>>>     0                 0
>>>>>
>>>>> Native-Transport-Requests             11         0        4547746
>>>>>     0                67
>>>>>
>>>>> PerDiskMemtableFlushWriter_2           0         0             69
>>>>>     0                 0
>>>>>
>>>>> MigrationStage                         1      1545            586
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_0           0         0             80
>>>>>     0                 0
>>>>>
>>>>> Sampler                                0         0              0
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_5           0         0             69
>>>>>     0                 0
>>>>>
>>>>> InternalResponseStage                  0         0          45432
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_6           0         0             69
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_3           0         0             69
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_4           0         0             69
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_9           0         0             69
>>>>>     0                 0
>>>>>
>>>>> AntiEntropyStage                       0         0              0
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_7           0         0             69
>>>>>     0                 0
>>>>>
>>>>> PerDiskMemtableFlushWriter_8           0         0             69
>>>>>     0                 0
>>>>>
>>>>>
>>>>>
>>>>> Message type           Dropped
>>>>>
>>>>> READ                         0
>>>>>
>>>>> RANGE_SLICE                  0
>>>>>
>>>>> _TRACE                       0
>>>>>
>>>>> HINT                         0
>>>>>
>>>>> MUTATION                     0
>>>>>
>>>>> COUNTER_MUTATION             0
>>>>>
>>>>> BATCH_STORE                  0
>>>>>
>>>>> BATCH_REMOVE                 0
>>>>>
>>>>> REQUEST_RESPONSE             0
>>>>>
>>>>> PAGED_RANGE                  0
>>>>>
>>>>> READ_REPAIR                  0
>>>>>
>>>>>
>>>>>
>>>>> -dan
>>>>> The contents of this e-mail are intended for the named addressee only.
>>>>> It contains information that may be confidential. Unless you are the named
>>>>> addressee or an authorized designee, you may not copy or use it, or
>>>>> disclose it to anyone else. If you received it in error please notify us
>>>>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>>>>> number FN 91482h) is a company registered in Linz whose registered office
>>>>> is at 4040 Linz, Austria, Freistädterstraße 313
>>>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dan Kinder
>>>> Principal Software Engineer
>>>> Turnitin – www.turnitin.com
>>>> dkinder@turnitin.com
>>>>
>>>
>>>
>>
>>
>> --
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Attached a jstack thread dump from a node that's having trouble (see
tpstats paste as well)

Pool Name                         Active   Pending      Completed   Blocked
All time blocked

ReadStage                              0         0         409111         0
              0

MiscStage                              0         0              0         0
              0

CompactionExecutor                    10        10           2481         0
              0

MutationStage                         32   1572402       31515477         0
              0

GossipStage                            1      1620         180091         0
              0

RequestResponseStage                   0         0       17509006         0
              0

ReadRepairStage                        0         0           4833         0
              0

CounterMutationStage                   0         0              0         0
              0

MemtableFlushWriter                    1        44             74         0
              0

MemtablePostFlush                      1       281            158         0
              0

ValidationExecutor                     0         0              0         0
              0

ViewMutationStage                      0         0              0         0
              0

CacheCleanupExecutor                   0         0              0         0
              0

PerDiskMemtableFlushWriter_10          0         0             67         0
              0

PerDiskMemtableFlushWriter_11          0         0             67         0
              0

MemtableReclaimMemory                  0         0             79         0
              0

PendingRangeCalculator                 0         0             19         0
              0

SecondaryIndexManagement               0         0              0         0
              0

HintsDispatcher                        1         1           1456         0
              0

PerDiskMemtableFlushWriter_1           0         0             67         0
              0

Native-Transport-Requests             69         0       17951984         0
          26956

PerDiskMemtableFlushWriter_2           0         0             67         0
              0

MigrationStage                         1       201           1048         0
              0

PerDiskMemtableFlushWriter_0           0         0             78         0
              0

Sampler                                0         0              0         0
              0

PerDiskMemtableFlushWriter_5           0         0             67         0
              0

InternalResponseStage                  0         0        2754163         0
              0

PerDiskMemtableFlushWriter_6           0         0             67         0
              0

PerDiskMemtableFlushWriter_3           0         0             67         0
              0

PerDiskMemtableFlushWriter_4           0         0             67         0
              0

PerDiskMemtableFlushWriter_9           0         0             67         0
              0

AntiEntropyStage                       0         0              0         0
              0

PerDiskMemtableFlushWriter_7           0         0             67         0
              0

PerDiskMemtableFlushWriter_8           0         0             67         0
              0


Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0

On Thu, Sep 28, 2017 at 2:27 PM, Dan Kinder <dk...@turnitin.com> wrote:

> Unless I'm reading the logs wrong, it doesn't seem like the pauses are
> very long... Attached the full log as well since node restart.
>
> $ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
> 2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
> 2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
> 2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.30 sys=0.00, real=0.03 secs]
> 2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
> 2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
> 2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.31 sys=0.01, real=0.02 secs]
> 2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
> 2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
> 2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
> 2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.34 sys=0.00, real=0.03 secs]
> 2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.34 sys=0.01, real=0.03 secs]
> 2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
> 2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.31 sys=0.00, real=0.02 secs]
> 2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
> 2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.32 sys=0.01, real=0.02 secs]
> 2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.02 secs]
> 2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.32 sys=0.00, real=0.02 secs]
> 2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.33 sys=0.00, real=0.03 secs]
> 2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.34 sys=0.00, real=0.02 secs]
> 2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
> (young)
>  [Times: user=0.35 sys=0.00, real=0.02 secs]
>
>
> On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> The digest mismatch exception is not a problem, that's why it's only
>> logged at debug.
>>
>> As Thomas noted, there's a pretty good chance this is
>> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
>> of GCInspector logs indicating GC pauses, that would add confidence to that
>> diagnosis.  <https://issues.apache.org/jira/browse/CASSANDRA-13754>
>>
>>
>> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:
>>
>>> Thanks for the responses.
>>>
>>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>>> run upgradesstables yet.
>>>
>>> @Thomas no I don't see any major GC going on.
>>>
>>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>>> and bring it back (thankfully this cluster is not serving live traffic).
>>> The nodes seemed okay for an hour or two, but I see the issue again,
>>> without me bouncing any nodes. This time it's ReadStage that's building up,
>>> and the exception I'm seeing in the logs is:
>>>
>>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206
>>> ReadCallback.java:242 - Digest mismatch:
>>>
>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>>
>>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_71]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_71]
>>>
>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>> [apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>>
>>>
>>> Do you think running upgradesstables would help? Or relocatesstables? I
>>> presumed it shouldn't be necessary for Cassandra to function, just an
>>> optimization.
>>>
>>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>>> thomas.steinmaurer@dynatrace.com> wrote:
>>>
>>>> Dan,
>>>>
>>>>
>>>>
>>>> do you see any major GC? We have been hit by the following memory leak
>>>> in our loadtest environment with 3.11.0.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>>>
>>>>
>>>>
>>>> So, depending on the heap size and uptime, you might get into heap
>>>> troubles.
>>>>
>>>>
>>>>
>>>> Thomas
>>>>
>>>>
>>>>
>>>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>>>> *Sent:* Donnerstag, 28. September 2017 18:20
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:*
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>>>> following. The cluster does function, for a while, but then some stages
>>>> begin to back up and the node does not recover and does not drain the
>>>> tasks, even under no load. This happens both to MutationStage and
>>>> GossipStage.
>>>>
>>>> I do see the following exception happen in the logs:
>>>>
>>>>
>>>>
>>>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>>>> CassandraDaemon.java:228 - Exception in thread
>>>> Thread[ReadRepairStage:2328,5,main]
>>>>
>>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>>> out - received only 1 responses.
>>>>
>>>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> ~[na:1.8.0_91]
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> ~[na:1.8.0_91]
>>>>
>>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>>>
>>>>
>>>>
>>>> But it's hard to correlate precisely with things going bad. It is also
>>>> very strange to me since I have both read_repair_chance and
>>>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>>>> confusing why ReadRepairStage would err.
>>>>
>>>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>>>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>>>> If I can't find a resolution I'm going to need to downgrade and restore to
>>>> backup...
>>>>
>>>> The only issue I found that looked similar is
>>>> https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears
>>>> to be fixed by 3.10.
>>>>
>>>>
>>>>
>>>> $ nodetool tpstats
>>>>
>>>> Pool Name                         Active   Pending      Completed
>>>> Blocked  All time blocked
>>>>
>>>> ReadStage                              0         0         582103
>>>>     0                 0
>>>>
>>>> MiscStage                              0         0              0
>>>>     0                 0
>>>>
>>>> CompactionExecutor                    11        11           2868
>>>>     0                 0
>>>>
>>>> MutationStage                         32   4593678       55057393
>>>>     0                 0
>>>>
>>>> GossipStage                            1      2818         371487
>>>>     0                 0
>>>>
>>>> RequestResponseStage                   0         0        4345522
>>>>     0                 0
>>>>
>>>> ReadRepairStage                        0         0         151473
>>>>     0                 0
>>>>
>>>> CounterMutationStage                   0         0              0
>>>>     0                 0
>>>>
>>>> MemtableFlushWriter                    1        81             76
>>>>     0                 0
>>>>
>>>> MemtablePostFlush                      1       382            139
>>>>     0                 0
>>>>
>>>> ValidationExecutor                     0         0              0
>>>>     0                 0
>>>>
>>>> ViewMutationStage                      0         0              0
>>>>     0                 0
>>>>
>>>> CacheCleanupExecutor                   0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_10          0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_11          0         0             69
>>>>     0                 0
>>>>
>>>> MemtableReclaimMemory                  0         0             81
>>>>     0                 0
>>>>
>>>> PendingRangeCalculator                 0         0             32
>>>>     0                 0
>>>>
>>>> SecondaryIndexManagement               0         0              0
>>>>     0                 0
>>>>
>>>> HintsDispatcher                        0         0            596
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_1           0         0             69
>>>>     0                 0
>>>>
>>>> Native-Transport-Requests             11         0        4547746
>>>>     0                67
>>>>
>>>> PerDiskMemtableFlushWriter_2           0         0             69
>>>>     0                 0
>>>>
>>>> MigrationStage                         1      1545            586
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_0           0         0             80
>>>>     0                 0
>>>>
>>>> Sampler                                0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_5           0         0             69
>>>>     0                 0
>>>>
>>>> InternalResponseStage                  0         0          45432
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_6           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_3           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_4           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_9           0         0             69
>>>>     0                 0
>>>>
>>>> AntiEntropyStage                       0         0              0
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_7           0         0             69
>>>>     0                 0
>>>>
>>>> PerDiskMemtableFlushWriter_8           0         0             69
>>>>     0                 0
>>>>
>>>>
>>>>
>>>> Message type           Dropped
>>>>
>>>> READ                         0
>>>>
>>>> RANGE_SLICE                  0
>>>>
>>>> _TRACE                       0
>>>>
>>>> HINT                         0
>>>>
>>>> MUTATION                     0
>>>>
>>>> COUNTER_MUTATION             0
>>>>
>>>> BATCH_STORE                  0
>>>>
>>>> BATCH_REMOVE                 0
>>>>
>>>> REQUEST_RESPONSE             0
>>>>
>>>> PAGED_RANGE                  0
>>>>
>>>> READ_REPAIR                  0
>>>>
>>>>
>>>>
>>>> -dan
>>>> The contents of this e-mail are intended for the named addressee only.
>>>> It contains information that may be confidential. Unless you are the named
>>>> addressee or an authorized designee, you may not copy or use it, or
>>>> disclose it to anyone else. If you received it in error please notify us
>>>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>>>> number FN 91482h) is a company registered in Linz whose registered office
>>>> is at 4040 Linz, Austria, Freistädterstraße 313
>>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>>>
>>>
>>>
>>>
>>> --
>>> Dan Kinder
>>> Principal Software Engineer
>>> Turnitin – www.turnitin.com
>>> dkinder@turnitin.com
>>>
>>
>>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Unless I'm reading the logs wrong, it doesn't seem like the pauses are very
long... Attached the full log as well since node restart.

$ grep -i "pause\|times" /var/log/cassandra/gc.log.0.current | tail -n 40
2017-09-28T14:18:48.324-0700: 12866.695: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:18:58.198-0700: 12876.569: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.34 sys=0.00, real=0.03 secs]
2017-09-28T14:19:08.039-0700: 12886.411: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.30 sys=0.00, real=0.03 secs]
2017-09-28T14:19:17.900-0700: 12896.271: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:19:28.995-0700: 12907.366: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.31 sys=0.00, real=0.02 secs]
2017-09-28T14:19:40.576-0700: 12918.947: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.31 sys=0.01, real=0.02 secs]
2017-09-28T14:19:52.164-0700: 12930.535: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.32 sys=0.01, real=0.02 secs]
2017-09-28T14:20:03.813-0700: 12942.184: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:20:15.584-0700: 12953.955: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.35 sys=0.00, real=0.02 secs]
2017-09-28T14:20:27.261-0700: 12965.632: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.34 sys=0.00, real=0.03 secs]
2017-09-28T14:20:38.894-0700: 12977.265: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.34 sys=0.01, real=0.03 secs]
2017-09-28T14:20:50.685-0700: 12989.056: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:21:02.281-0700: 13000.652: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.31 sys=0.00, real=0.02 secs]
2017-09-28T14:21:13.975-0700: 13012.346: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.32 sys=0.00, real=0.02 secs]
2017-09-28T14:21:25.797-0700: 13024.168: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.32 sys=0.01, real=0.02 secs]
2017-09-28T14:21:37.395-0700: 13035.766: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.02 secs]
2017-09-28T14:21:49.166-0700: 13047.537: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.32 sys=0.00, real=0.02 secs]
2017-09-28T14:22:00.887-0700: 13059.258: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.33 sys=0.00, real=0.03 secs]
2017-09-28T14:22:12.453-0700: 13070.824: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.34 sys=0.00, real=0.02 secs]
2017-09-28T14:22:24.023-0700: 13082.394: [GC pause (G1 Evacuation Pause)
(young)
 [Times: user=0.35 sys=0.00, real=0.02 secs]


On Thu, Sep 28, 2017 at 2:13 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> The digest mismatch exception is not a problem, that's why it's only
> logged at debug.
>
> As Thomas noted, there's a pretty good chance this is
> https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot
> of GCInspector logs indicating GC pauses, that would add confidence to that
> diagnosis.  <https://issues.apache.org/jira/browse/CASSANDRA-13754>
>
>
> On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:
>
>> Thanks for the responses.
>>
>> @Prem yes this is after the entire cluster is on 3.11, but no I did not
>> run upgradesstables yet.
>>
>> @Thomas no I don't see any major GC going on.
>>
>> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down
>> and bring it back (thankfully this cluster is not serving live traffic).
>> The nodes seemed okay for an hour or two, but I see the issue again,
>> without me bouncing any nodes. This time it's ReadStage that's building up,
>> and the exception I'm seeing in the logs is:
>>
>> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
>> - Digest mismatch:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(6150926370328526396, 696a6374652e6f7267)
>> (2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>>
>>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_71]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_71]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>>
>>
>> Do you think running upgradesstables would help? Or relocatesstables? I
>> presumed it shouldn't be necessary for Cassandra to function, just an
>> optimization.
>>
>> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
>> thomas.steinmaurer@dynatrace.com> wrote:
>>
>>> Dan,
>>>
>>>
>>>
>>> do you see any major GC? We have been hit by the following memory leak
>>> in our loadtest environment with 3.11.0.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>>
>>>
>>>
>>> So, depending on the heap size and uptime, you might get into heap
>>> troubles.
>>>
>>>
>>>
>>> Thomas
>>>
>>>
>>>
>>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>>> *Sent:* Donnerstag, 28. September 2017 18:20
>>> *To:* user@cassandra.apache.org
>>> *Subject:*
>>>
>>>
>>>
>>> Hi,
>>>
>>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>>> following. The cluster does function, for a while, but then some stages
>>> begin to back up and the node does not recover and does not drain the
>>> tasks, even under no load. This happens both to MutationStage and
>>> GossipStage.
>>>
>>> I do see the following exception happen in the logs:
>>>
>>>
>>>
>>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>>> CassandraDaemon.java:228 - Exception in thread
>>> Thread[ReadRepairStage:2328,5,main]
>>>
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>> out - received only 1 responses.
>>>
>>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>>> ors$2.close(UnfilteredPartitionIterators.java:182)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_91]
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> ~[na:1.8.0_91]
>>>
>>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$th
>>> readLocalDeallocator$0(NamedThreadFactory.java:81)
>>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>>
>>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>>
>>>
>>>
>>> But it's hard to correlate precisely with things going bad. It is also
>>> very strange to me since I have both read_repair_chance and
>>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>>> confusing why ReadRepairStage would err.
>>>
>>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>>> If I can't find a resolution I'm going to need to downgrade and restore to
>>> backup...
>>>
>>> The only issue I found that looked similar is
>>> https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears
>>> to be fixed by 3.10.
>>>
>>>
>>>
>>> $ nodetool tpstats
>>>
>>> Pool Name                         Active   Pending      Completed
>>> Blocked  All time blocked
>>>
>>> ReadStage                              0         0         582103
>>>   0                 0
>>>
>>> MiscStage                              0         0              0
>>>   0                 0
>>>
>>> CompactionExecutor                    11        11           2868
>>>   0                 0
>>>
>>> MutationStage                         32   4593678       55057393
>>>   0                 0
>>>
>>> GossipStage                            1      2818         371487
>>>   0                 0
>>>
>>> RequestResponseStage                   0         0        4345522
>>>   0                 0
>>>
>>> ReadRepairStage                        0         0         151473
>>>   0                 0
>>>
>>> CounterMutationStage                   0         0              0
>>>   0                 0
>>>
>>> MemtableFlushWriter                    1        81             76
>>>   0                 0
>>>
>>> MemtablePostFlush                      1       382            139
>>>   0                 0
>>>
>>> ValidationExecutor                     0         0              0
>>>   0                 0
>>>
>>> ViewMutationStage                      0         0              0
>>>   0                 0
>>>
>>> CacheCleanupExecutor                   0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_10          0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_11          0         0             69
>>>   0                 0
>>>
>>> MemtableReclaimMemory                  0         0             81
>>>   0                 0
>>>
>>> PendingRangeCalculator                 0         0             32
>>>   0                 0
>>>
>>> SecondaryIndexManagement               0         0              0
>>>   0                 0
>>>
>>> HintsDispatcher                        0         0            596
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_1           0         0             69
>>>   0                 0
>>>
>>> Native-Transport-Requests             11         0        4547746
>>>   0                67
>>>
>>> PerDiskMemtableFlushWriter_2           0         0             69
>>>   0                 0
>>>
>>> MigrationStage                         1      1545            586
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_0           0         0             80
>>>   0                 0
>>>
>>> Sampler                                0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_5           0         0             69
>>>   0                 0
>>>
>>> InternalResponseStage                  0         0          45432
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_6           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_3           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_4           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_9           0         0             69
>>>   0                 0
>>>
>>> AntiEntropyStage                       0         0              0
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_7           0         0             69
>>>   0                 0
>>>
>>> PerDiskMemtableFlushWriter_8           0         0             69
>>>   0                 0
>>>
>>>
>>>
>>> Message type           Dropped
>>>
>>> READ                         0
>>>
>>> RANGE_SLICE                  0
>>>
>>> _TRACE                       0
>>>
>>> HINT                         0
>>>
>>> MUTATION                     0
>>>
>>> COUNTER_MUTATION             0
>>>
>>> BATCH_STORE                  0
>>>
>>> BATCH_REMOVE                 0
>>>
>>> REQUEST_RESPONSE             0
>>>
>>> PAGED_RANGE                  0
>>>
>>> READ_REPAIR                  0
>>>
>>>
>>>
>>> -dan
>>> The contents of this e-mail are intended for the named addressee only.
>>> It contains information that may be confidential. Unless you are the named
>>> addressee or an authorized designee, you may not copy or use it, or
>>> disclose it to anyone else. If you received it in error please notify us
>>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>>> number FN 91482h) is a company registered in Linz whose registered office
>>> is at 4040 Linz, Austria, Freistädterstraße 313
>>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>>
>>
>>
>>
>> --
>> Dan Kinder
>> Principal Software Engineer
>> Turnitin – www.turnitin.com
>> dkinder@turnitin.com
>>
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
The digest mismatch exception is not a problem, that's why it's only logged
at debug.

As Thomas noted, there's a pretty good chance this is
https://issues.apache.org/jira/browse/CASSANDRA-13754 - if you see a lot of
GCInspector logs indicating GC pauses, that would add confidence to that
diagnosis.  <https://issues.apache.org/jira/browse/CASSANDRA-13754>


On Thu, Sep 28, 2017 at 2:08 PM, Dan Kinder <dk...@turnitin.com> wrote:

> Thanks for the responses.
>
> @Prem yes this is after the entire cluster is on 3.11, but no I did not
> run upgradesstables yet.
>
> @Thomas no I don't see any major GC going on.
>
> @Jeff yeah it's fully upgraded. I decided to shut the whole thing down and
> bring it back (thankfully this cluster is not serving live traffic). The
> nodes seemed okay for an hour or two, but I see the issue again, without me
> bouncing any nodes. This time it's ReadStage that's building up, and the
> exception I'm seeing in the logs is:
>
> DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242
> - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(6150926370328526396, 696a6374652e6f7267) (
> 2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)
>
>         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.ReadCallback$
> AsyncRepairRunner.run(ReadCallback.java:233)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_71]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_71]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> [apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]
>
>
> Do you think running upgradesstables would help? Or relocatesstables? I
> presumed it shouldn't be necessary for Cassandra to function, just an
> optimization.
>
> On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
>> Dan,
>>
>>
>>
>> do you see any major GC? We have been hit by the following memory leak in
>> our loadtest environment with 3.11.0.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-13754
>>
>>
>>
>> So, depending on the heap size and uptime, you might get into heap
>> troubles.
>>
>>
>>
>> Thomas
>>
>>
>>
>> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
>> *Sent:* Donnerstag, 28. September 2017 18:20
>> *To:* user@cassandra.apache.org
>> *Subject:*
>>
>>
>>
>> Hi,
>>
>> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
>> following. The cluster does function, for a while, but then some stages
>> begin to back up and the node does not recover and does not drain the
>> tasks, even under no load. This happens both to MutationStage and
>> GossipStage.
>>
>> I do see the following exception happen in the logs:
>>
>>
>>
>> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
>> CassandraDaemon.java:228 - Exception in thread
>> Thread[ReadRepairStage:2328,5,main]
>>
>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>> out - received only 1 responses.
>>
>>         at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.partitions.UnfilteredPartitionIterat
>> ors$2.close(UnfilteredPartitionIterators.java:182)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThr
>> ow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_91]
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_91]
>>
>>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:81)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>>
>>
>>
>> But it's hard to correlate precisely with things going bad. It is also
>> very strange to me since I have both read_repair_chance and
>> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
>> confusing why ReadRepairStage would err.
>>
>> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
>> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
>> If I can't find a resolution I'm going to need to downgrade and restore to
>> backup...
>>
>> The only issue I found that looked similar is https://issues.apache.org/j
>> ira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>>
>>
>>
>> $ nodetool tpstats
>>
>> Pool Name                         Active   Pending      Completed
>> Blocked  All time blocked
>>
>> ReadStage                              0         0         582103
>>   0                 0
>>
>> MiscStage                              0         0              0
>>   0                 0
>>
>> CompactionExecutor                    11        11           2868
>>   0                 0
>>
>> MutationStage                         32   4593678       55057393
>>   0                 0
>>
>> GossipStage                            1      2818         371487
>>   0                 0
>>
>> RequestResponseStage                   0         0        4345522
>>   0                 0
>>
>> ReadRepairStage                        0         0         151473
>>   0                 0
>>
>> CounterMutationStage                   0         0              0
>>   0                 0
>>
>> MemtableFlushWriter                    1        81             76
>>   0                 0
>>
>> MemtablePostFlush                      1       382            139
>>   0                 0
>>
>> ValidationExecutor                     0         0              0
>>   0                 0
>>
>> ViewMutationStage                      0         0              0
>>   0                 0
>>
>> CacheCleanupExecutor                   0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_10          0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_11          0         0             69
>>   0                 0
>>
>> MemtableReclaimMemory                  0         0             81
>>   0                 0
>>
>> PendingRangeCalculator                 0         0             32
>>   0                 0
>>
>> SecondaryIndexManagement               0         0              0
>>   0                 0
>>
>> HintsDispatcher                        0         0            596
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_1           0         0             69
>>   0                 0
>>
>> Native-Transport-Requests             11         0        4547746
>>   0                67
>>
>> PerDiskMemtableFlushWriter_2           0         0             69
>>   0                 0
>>
>> MigrationStage                         1      1545            586
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_0           0         0             80
>>   0                 0
>>
>> Sampler                                0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_5           0         0             69
>>   0                 0
>>
>> InternalResponseStage                  0         0          45432
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_6           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_3           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_4           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_9           0         0             69
>>   0                 0
>>
>> AntiEntropyStage                       0         0              0
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_7           0         0             69
>>   0                 0
>>
>> PerDiskMemtableFlushWriter_8           0         0             69
>>   0                 0
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>> -dan
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>>
>
>
>
> --
> Dan Kinder
> Principal Software Engineer
> Turnitin – www.turnitin.com
> dkinder@turnitin.com
>

Re:

Posted by Dan Kinder <dk...@turnitin.com>.
Thanks for the responses.

@Prem yes this is after the entire cluster is on 3.11, but no I did not run
upgradesstables yet.

@Thomas no I don't see any major GC going on.

@Jeff yeah it's fully upgraded. I decided to shut the whole thing down and
bring it back (thankfully this cluster is not serving live traffic). The
nodes seemed okay for an hour or two, but I see the issue again, without me
bouncing any nodes. This time it's ReadStage that's building up, and the
exception I'm seeing in the logs is:

DEBUG [ReadRepairStage:106] 2017-09-28 13:01:37,206 ReadCallback.java:242 -
Digest mismatch:

org.apache.cassandra.service.DigestMismatchException: Mismatch for key
DecoratedKey(6150926370328526396, 696a6374652e6f7267)
(2f0fffe2d743cdc4c69c3eb351a3c9ca vs 00ee661ae190c2cbf0eb2fb8a51f6025)

        at
org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
~[apache-cassandra-3.11.0.jar:3.11.0]

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_71]

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_71]

        at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
[apache-cassandra-3.11.0.jar:3.11.0]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_71]


Do you think running upgradesstables would help? Or relocatesstables? I
presumed it shouldn't be necessary for Cassandra to function, just an
optimization.

On Thu, Sep 28, 2017 at 12:49 PM, Steinmaurer, Thomas <
thomas.steinmaurer@dynatrace.com> wrote:

> Dan,
>
>
>
> do you see any major GC? We have been hit by the following memory leak in
> our loadtest environment with 3.11.0.
>
> https://issues.apache.org/jira/browse/CASSANDRA-13754
>
>
>
> So, depending on the heap size and uptime, you might get into heap
> troubles.
>
>
>
> Thomas
>
>
>
> *From:* Dan Kinder [mailto:dkinder@turnitin.com]
> *Sent:* Donnerstag, 28. September 2017 18:20
> *To:* user@cassandra.apache.org
> *Subject:*
>
>
>
> Hi,
>
> I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the
> following. The cluster does function, for a while, but then some stages
> begin to back up and the node does not recover and does not drain the
> tasks, even under no load. This happens both to MutationStage and
> GossipStage.
>
> I do see the following exception happen in the logs:
>
>
>
> ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440
> CassandraDaemon.java:228 - Exception in thread
> Thread[ReadRepairStage:2328,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 1 responses.
>
>         at org.apache.cassandra.service.DataResolver$
> RepairMergeListener.close(DataResolver.java:171)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.partitions.
> UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_91]
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_91]
>
>         at org.apache.cassandra.concurrent.NamedThreadFactory.
> lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
>         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
>
>
>
> But it's hard to correlate precisely with things going bad. It is also
> very strange to me since I have both read_repair_chance and
> dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is
> confusing why ReadRepairStage would err.
>
> Anyone have thoughts on this? It's pretty muddling, and causes nodes to
> lock up. Once it happens Cassandra can't even shut down, I have to kill -9.
> If I can't find a resolution I'm going to need to downgrade and restore to
> backup...
>
> The only issue I found that looked similar is https://issues.apache.org/
> jira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.
>
>
>
> $ nodetool tpstats
>
> Pool Name                         Active   Pending      Completed
> Blocked  All time blocked
>
> ReadStage                              0         0         582103         0
>                 0
>
> MiscStage                              0         0              0         0
>                 0
>
> CompactionExecutor                    11        11           2868         0
>                 0
>
> MutationStage                         32   4593678       55057393         0
>                 0
>
> GossipStage                            1      2818         371487         0
>                 0
>
> RequestResponseStage                   0         0        4345522         0
>                 0
>
> ReadRepairStage                        0         0         151473         0
>                 0
>
> CounterMutationStage                   0         0              0         0
>                 0
>
> MemtableFlushWriter                    1        81             76         0
>                 0
>
> MemtablePostFlush                      1       382            139         0
>                 0
>
> ValidationExecutor                     0         0              0         0
>                 0
>
> ViewMutationStage                      0         0              0         0
>                 0
>
> CacheCleanupExecutor                   0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_10          0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_11          0         0             69         0
>                 0
>
> MemtableReclaimMemory                  0         0             81         0
>                 0
>
> PendingRangeCalculator                 0         0             32         0
>                 0
>
> SecondaryIndexManagement               0         0              0         0
>                 0
>
> HintsDispatcher                        0         0            596         0
>                 0
>
> PerDiskMemtableFlushWriter_1           0         0             69         0
>                 0
>
> Native-Transport-Requests             11         0        4547746
> 0                67
>
> PerDiskMemtableFlushWriter_2           0         0             69         0
>                 0
>
> MigrationStage                         1      1545            586         0
>                 0
>
> PerDiskMemtableFlushWriter_0           0         0             80         0
>                 0
>
> Sampler                                0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_5           0         0             69         0
>                 0
>
> InternalResponseStage                  0         0          45432         0
>                 0
>
> PerDiskMemtableFlushWriter_6           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_3           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_4           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_9           0         0             69         0
>                 0
>
> AntiEntropyStage                       0         0              0         0
>                 0
>
> PerDiskMemtableFlushWriter_7           0         0             69         0
>                 0
>
> PerDiskMemtableFlushWriter_8           0         0             69         0
>                 0
>
>
>
> Message type           Dropped
>
> READ                         0
>
> RANGE_SLICE                  0
>
> _TRACE                       0
>
> HINT                         0
>
> MUTATION                     0
>
> COUNTER_MUTATION             0
>
> BATCH_STORE                  0
>
> BATCH_REMOVE                 0
>
> REQUEST_RESPONSE             0
>
> PAGED_RANGE                  0
>
> READ_REPAIR                  0
>
>
>
> -dan
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313&entry=gmail&source=g>
>



-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkinder@turnitin.com

RE:

Posted by "Steinmaurer, Thomas" <th...@dynatrace.com>.
Dan,

do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0.
https://issues.apache.org/jira/browse/CASSANDRA-13754

So, depending on the heap size and uptime, you might get into heap troubles.

Thomas

From: Dan Kinder [mailto:dkinder@turnitin.com]
Sent: Donnerstag, 28. September 2017 18:20
To: user@cassandra.apache.org
Subject:


Hi,

I recently upgraded our 16-node cluster from 2.2.6 to 3.11 and see the following. The cluster does function, for a while, but then some stages begin to back up and the node does not recover and does not drain the tasks, even under no load. This happens both to MutationStage and GossipStage.

I do see the following exception happen in the logs:



ERROR [ReadRepairStage:2328] 2017-09-26 23:07:55,440 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:2328,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses.

        at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:171) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:182) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:82) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:89) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_91]

        at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[apache-cassandra-3.11.0.jar:3.11.0]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]



But it's hard to correlate precisely with things going bad. It is also very strange to me since I have both read_repair_chance and dclocal_read_repair_chance set to 0.0 for ALL of my tables. So it is confusing why ReadRepairStage would err.

Anyone have thoughts on this? It's pretty muddling, and causes nodes to lock up. Once it happens Cassandra can't even shut down, I have to kill -9. If I can't find a resolution I'm going to need to downgrade and restore to backup...

The only issue I found that looked similar is https://issues.apache.org/jira/browse/CASSANDRA-12689 but that appears to be fixed by 3.10.



$ nodetool tpstats

Pool Name                         Active   Pending      Completed   Blocked  All time blocked

ReadStage                              0         0         582103         0                 0

MiscStage                              0         0              0         0                 0

CompactionExecutor                    11        11           2868         0                 0

MutationStage                         32   4593678       55057393         0                 0

GossipStage                            1      2818         371487         0                 0

RequestResponseStage                   0         0        4345522         0                 0

ReadRepairStage                        0         0         151473         0                 0

CounterMutationStage                   0         0              0         0                 0

MemtableFlushWriter                    1        81             76         0                 0

MemtablePostFlush                      1       382            139         0                 0

ValidationExecutor                     0         0              0         0                 0

ViewMutationStage                      0         0              0         0                 0

CacheCleanupExecutor                   0         0              0         0                 0

PerDiskMemtableFlushWriter_10          0         0             69         0                 0

PerDiskMemtableFlushWriter_11          0         0             69         0                 0

MemtableReclaimMemory                  0         0             81         0                 0

PendingRangeCalculator                 0         0             32         0                 0

SecondaryIndexManagement               0         0              0         0                 0

HintsDispatcher                        0         0            596         0                 0

PerDiskMemtableFlushWriter_1           0         0             69         0                 0

Native-Transport-Requests             11         0        4547746         0                67

PerDiskMemtableFlushWriter_2           0         0             69         0                 0

MigrationStage                         1      1545            586         0                 0

PerDiskMemtableFlushWriter_0           0         0             80         0                 0

Sampler                                0         0              0         0                 0

PerDiskMemtableFlushWriter_5           0         0             69         0                 0

InternalResponseStage                  0         0          45432         0                 0

PerDiskMemtableFlushWriter_6           0         0             69         0                 0

PerDiskMemtableFlushWriter_3           0         0             69         0                 0

PerDiskMemtableFlushWriter_4           0         0             69         0                 0

PerDiskMemtableFlushWriter_9           0         0             69         0                 0

AntiEntropyStage                       0         0              0         0                 0

PerDiskMemtableFlushWriter_7           0         0             69         0                 0

PerDiskMemtableFlushWriter_8           0         0             69         0                 0



Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0


-dan
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313