You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Haithem Jarraya <ha...@struq.com> on 2013/06/14 14:01:28 UTC

Thrift threads proliferation

Hi All,

We are facing a very strange issue in our C* ring. We are using C* v1.2.4,
7 Nodes in DC1, 3 Nodes in DC2 and 3 Nodes in DC3.
We have been testing read/write performances in DC1, by having different
disks configurations.
For instance we have node1-DC1 use JBOD and node2-DC1 is using RAID-0
configuration.
Over the last week everything seems to be running fine until yesterday when
node2-DC1 (RAID-0) config stop responding to client requests and timing out
queries.
JMX console showed up to 25k Thrift threads running, no pending compaction
running, a lot of pending reads and that's it, CPU is averaging at 10% heap
usage is about 4GB of the 8GB available.
Node2-DC1 become unresponsive but still other node were trying to query it
and it was not flagged as dead or unresponsive from Gossip messages,
wondering if it's a bug.

Log file shows after stopping thrift from node2-DC1:

 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,433 StatusLogger.java (line
53) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,564 StatusLogger.java (line
68) ReadStage                        30       959         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,564 StatusLogger.java (line
68) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,565 StatusLogger.java (line
68) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,565 StatusLogger.java (line
68) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,566 StatusLogger.java (line
68) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,566 StatusLogger.java (line
68) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,567 StatusLogger.java (line
68) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,567 StatusLogger.java (line
68) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,568 StatusLogger.java (line
68) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,568 StatusLogger.java (line
68) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,569 StatusLogger.java (line
68) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,569 StatusLogger.java (line
68) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,570 StatusLogger.java (line
68) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,570 StatusLogger.java (line
68) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,572 StatusLogger.java (line
73) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,573 StatusLogger.java (line
85) MessagingService                n/a      0,42
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line
95) Cache Type                     Size                 Capacity
    KeysToSave
Provider
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line
96) KeyCache                  602369792               1048576000
           all
 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line
102) RowCache                          0                        0
           all


 Any hint to track down this error would be useful,


Many Thanks,

Haithem