You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Aaron Turner <sy...@gmail.com> on 2012/06/23 03:40:14 UTC

One node hammered?

So trying to understand why I'm seeing one or two nodes getting
hammered when I have a batch job running.  By "hammered" I mean Disk
IO load is at or near 100% and CPU load is around 10.0-17.0 while the
vast majority of my nodes have a CPU load between 1-2.  The node(s)
hammered seems to move around during the job.  A few minutes on one
node, then it stabilizes and the problem moves to a different one or
two.  Sometimes the load across all my nodes is back to 1-2 even
though the job is running just as quickly as before.

The job at hand, takes a row KeyA in one CF (like InterfaceDaily), and
writes it as a single column in another CF (like
InterfaceDailyCompressed) to row KeyB and then deletes the original
row KeyA.  Using the random partitioner so I would think that read &
writes would be well distributed over my 9 nodes (RF=3).  The amount
of data read and written is pretty small (2304 bytes written read from
288 cols x 8 bytes each).  My job is multi-threaded, but I've reduced
the number of threads from 20 to 5 and while things seem a little
better, it's hardly 4x better and things are taking about 4x as long.

Client is using Hector 1.0.5 and servers are running Cassandra 1.0.9.

On the effected node will start generating logs like this:

 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,626 MessagingService.java
(line 607) 616 READ messages dropped in last 5000ms
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 50) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) ReadStage                        16      4628         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 70) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 82) MessagingService                n/a       0,0
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 86) ColumnFamily                Memtable ops,data  Row cache
size/cap  Key cache size/cap
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 89) system.NodeIdInfo                         0,0
 0/0                 1/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.IndexInfo                          0,0
 0/0                 0/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.LocationInfo                       0,0
 0/0                 1/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Versions                           0,0
 0/0                 0/3
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Migrations                         0,0
 0/0                 0/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.HintsColumnFamily                  0,0
 0/0                 1/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Schema                             0,0
 0/0                 1/1
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.MetroFabricDailyAgg     124884,59346265
   0/0         8228/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.FabricDailyAggCompressed                 0,0
        0/0            0/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.IbxFabricDailyAgg     116794,45801943
 0/0        16311/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.DeviceDailyAgg        121624,73123817
 0/0        61037/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.IbxFabricDailyAggCompressed        6016,8819380
           0/0         1317/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.Meta                    2094,16880922
 0/0         8768/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.FabricDailyAgg                    0,0
 0/0            0/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.InterfaceDailyCompressed      50900,57986845
        0/0       200000/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.DeviceDailyAggCompressed      35282,59557567
        0/0         7629/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.TrunklessDeviceDailyAgg      69603,47157326
       0/0        31600/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.LastStat                    32,936460
 0/0        19777/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.RegionFabricDailyAggCompressed         686,1191802
              0/0          147/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.TrunklessDeviceDailyAggCompressed
14144,23056640                 0/0         3058/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.RegionFabricDailyAgg      94085,12703937
    0/0         1627/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.DeviceFabricDailyAgg       10779,5941110
    0/0        31033/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.InterfaceDaily       280817,145511376
 0/0       200000/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.DeviceFabricDailyAggCompressed      15179,20592356
              0/0         3229/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.MetroFabricDailyAggCompressed        3032,4568080
             0/0          642/200000
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.rollups60          218594,158987534
 0/0               50/50
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.events_timeline                 0,0
 0/0                 0/5
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.rollups300           91185,71822435
 0/0               32/50
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.pdps                312736,62115027
 0/0               0/300
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.events                          0,0
 0/0                6/50
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.settings                        0,0
 0/0                0/50
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,633 StatusLogger.java
(line 89) OpsCenter.rollups7200            3908,3024712
 0/0                0/50
 INFO [ScheduledTasks:1] 2012-06-22 17:55:18,633 StatusLogger.java
(line 89) OpsCenter.rollups86400             796,608956
 0/0                0/50

Looking at the logs, I don't see any compactions or repairs going on.
Anyways, how should I debug this further to see what is causing this?


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"