You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Aaron Turner <sy...@gmail.com> on 2012/06/23 03:40:14 UTC
One node hammered?
So trying to understand why I'm seeing one or two nodes getting
hammered when I have a batch job running. By "hammered" I mean Disk
IO load is at or near 100% and CPU load is around 10.0-17.0 while the
vast majority of my nodes have a CPU load between 1-2. The node(s)
hammered seems to move around during the job. A few minutes on one
node, then it stabilizes and the problem moves to a different one or
two. Sometimes the load across all my nodes is back to 1-2 even
though the job is running just as quickly as before.
The job at hand, takes a row KeyA in one CF (like InterfaceDaily), and
writes it as a single column in another CF (like
InterfaceDailyCompressed) to row KeyB and then deletes the original
row KeyA. Using the random partitioner so I would think that read &
writes would be well distributed over my 9 nodes (RF=3). The amount
of data read and written is pretty small (2304 bytes written read from
288 cols x 8 bytes each). My job is multi-threaded, but I've reduced
the number of threads from 20 to 5 and while things seem a little
better, it's hardly 4x better and things are taking about 4x as long.
Client is using Hector 1.0.5 and servers are running Cassandra 1.0.9.
On the effected node will start generating logs like this:
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,626 MessagingService.java
(line 607) 616 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 50) Pool Name Active Pending Blocked
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) ReadStage 16 4628 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) RequestResponseStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,627 StatusLogger.java
(line 65) ReadRepairStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MutationStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) ReplicateOnWriteStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) GossipStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) AntiEntropyStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MigrationStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) StreamStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,628 StatusLogger.java
(line 65) MemtablePostFlusher 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) FlushWriter 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) MiscStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) InternalResponseStage 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 65) HintedHandoff 0 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 70) CompactionManager 0 0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 82) MessagingService n/a 0,0
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 86) ColumnFamily Memtable ops,data Row cache
size/cap Key cache size/cap
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,629 StatusLogger.java
(line 89) system.NodeIdInfo 0,0
0/0 1/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.IndexInfo 0,0
0/0 0/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.LocationInfo 0,0
0/0 1/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Versions 0,0
0/0 0/3
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Migrations 0,0
0/0 0/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.HintsColumnFamily 0,0
0/0 1/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) system.Schema 0,0
0/0 1/1
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.MetroFabricDailyAgg 124884,59346265
0/0 8228/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.FabricDailyAggCompressed 0,0
0/0 0/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.IbxFabricDailyAgg 116794,45801943
0/0 16311/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,630 StatusLogger.java
(line 89) spider3.DeviceDailyAgg 121624,73123817
0/0 61037/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.IbxFabricDailyAggCompressed 6016,8819380
0/0 1317/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.Meta 2094,16880922
0/0 8768/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.FabricDailyAgg 0,0
0/0 0/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.InterfaceDailyCompressed 50900,57986845
0/0 200000/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.DeviceDailyAggCompressed 35282,59557567
0/0 7629/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.TrunklessDeviceDailyAgg 69603,47157326
0/0 31600/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.LastStat 32,936460
0/0 19777/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.RegionFabricDailyAggCompressed 686,1191802
0/0 147/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.TrunklessDeviceDailyAggCompressed
14144,23056640 0/0 3058/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,631 StatusLogger.java
(line 89) spider3.RegionFabricDailyAgg 94085,12703937
0/0 1627/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.DeviceFabricDailyAgg 10779,5941110
0/0 31033/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.InterfaceDaily 280817,145511376
0/0 200000/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.DeviceFabricDailyAggCompressed 15179,20592356
0/0 3229/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) spider3.MetroFabricDailyAggCompressed 3032,4568080
0/0 642/200000
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.rollups60 218594,158987534
0/0 50/50
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.events_timeline 0,0
0/0 0/5
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.rollups300 91185,71822435
0/0 32/50
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.pdps 312736,62115027
0/0 0/300
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.events 0,0
0/0 6/50
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,632 StatusLogger.java
(line 89) OpsCenter.settings 0,0
0/0 0/50
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,633 StatusLogger.java
(line 89) OpsCenter.rollups7200 3908,3024712
0/0 0/50
INFO [ScheduledTasks:1] 2012-06-22 17:55:18,633 StatusLogger.java
(line 89) OpsCenter.rollups86400 796,608956
0/0 0/50
Looking at the logs, I don't see any compactions or repairs going on.
Anyways, how should I debug this further to see what is causing this?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"