You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Flavien Charlon (JIRA)" <ji...@apache.org> on 2014/12/21 00:36:13 UTC
[jira] [Created] (CASSANDRA-8529) Cassandra suddenly stops responding to clients though process is still running

Flavien Charlon created CASSANDRA-8529:
------------------------------------------

             Summary: Cassandra suddenly stops responding to clients though process is still running
                 Key: CASSANDRA-8529
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8529
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Flavien Charlon


I am running a moderate write-only load onto a 3 nodes cluster.

After some time, nodes become completely unresponsive to clients, even though the process is still running.

tpstats on affected nodes indicate pending compaction, which never gets executued.
This is tpstats on the affected node hours after the load has stopped:

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
CounterMutationStage              0         0              0         0                 0
ReadStage                         0         0         243384         0                 0
RequestResponseStage              0         0        3336833         0                 0
MutationStage                    32      1902        4775909         0                 0
ReadRepairStage                   0         0          14445         0                 0
GossipStage                       0         0         128499         0                 0
CacheCleanupExecutor              0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             36         0                 0
ValidationExecutor                0         0              0         0                 0
CommitLogArchiver                 0         0              0         0                 0
MiscStage                         0         0              0         0                 0
MemtableFlushWriter               2         7            947         0                 0
MemtableReclaimMemory             0         0            947         0                 0
PendingRangeCalculator            0         0              5         0                 0
MemtablePostFlush                 1         8           1241         0                 0
CompactionExecutor                2         8           1035         0                 0
InternalResponseStage             0         0              9         0                 0
HintedHandoff                     0         0              6         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0

Also, compactionstats shows that compaction is stalled:

   compaction type   keyspace          table   completed       total    unit   progress
        Compaction    testnet   transactions   117833347   117834891   bytes    100.00%
        Compaction    testnet        scripts   206418064   206419414   bytes    100.00%
Active compaction remaining time :   0h00m00s

And again, this has been like this for hours.

I have reproduced this on several clusters with various memory configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)