You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Benjamin Black <b...@b3k.us> on 2010/08/01 00:45:40 UTC
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Sorry, I just noticed: are you running 14 instances of Cassandra on a
single physical machine or are all those java processes something
else?
On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall <da...@gmail.com> wrote:
> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7 cores,
> RAID Disks).
>
> ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
> Pool Name Active Pending Completed
> STREAM-STAGE 0 0 0
> RESPONSE-STAGE 0 0 516280
> ROW-READ-STAGE 8 4096 1164326
> LB-OPERATIONS 0 0 0
> MESSAGE-DESERIALIZER-POOL 1 682008 1818682
> GMFD 0 0 6467
> LB-TARGET 0 0 0
> CONSISTENCY-MANAGER 0 0 661477
> ROW-MUTATION-STAGE 0 0 998780
> MESSAGE-STREAMING-POOL 0 0 0
> LOAD-BALANCER-STAGE 0 0 0
> FLUSH-SORTER-POOL 0 0 0
> MEMTABLE-POST-FLUSHER 0 0 4
> FLUSH-WRITER-POOL 0 0 4
> AE-SERVICE-STAGE 0 0 0
> HINTED-HANDOFF-POOL 0 0 3
>
> EQX root@cass04:~# vmstat -n 1
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id
> wa st
> 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5 1
> 94 0 0
> 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2 1
> 79 18 0
> 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2 1
> 81 16 0
> 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2 1
> 80 17 0
> 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2 1
> 83 14 0
> 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3 1
> 77 20 0
> 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2 1
> 75 22 0
> 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2 1
> 81 16 0
> 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225 2 1
> 77 19 0
> 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2 1
> 80 18 0
> 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2 1
> 79 18 0
> 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1 1
> 82 16 0
> 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2 1
> 84 13 0
> 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068 2 1
> 76 21 0
>
>
> But the 16 cores are hardly utilized. Which indicates to me there is some
> bad thread thrashing, but why?
>
>
>
> 1 [||||| 8.3%] Tasks:
> 1070 total, 1 running
> 2 [ 0.0%] Load
> average: 8.34 9.05 8.82
> 3 [ 0.0%] Uptime:
> 192 days(!), 15:29:52
> 4 [||||||||||| 17.9%]
> 5 [||||| 5.7%]
> 6 [|| 1.3%]
> 7 [|| 2.6%]
> 8 [| 0.6%]
> 9 [| 0.6%]
> 10 [|| 1.9%]
> 11 [|| 1.9%]
> 12 [|| 1.9%]
> 13 [|| 1.3%]
> 14 [| 0.6%]
> 15 [|| 1.3%]
> 16 [| 0.6%]
> Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
> Swp[| 6/1983MB]
>
> PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
> 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
> 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Posted by Jonathan Ellis <jb...@gmail.com>.
Let's centralize that conversation over on
https://issues.apache.org/jira/browse/CASSANDRA-1358
On Wed, Aug 4, 2010 at 4:32 PM, Mike Malone <mi...@simplegeo.com> wrote:
> So after 4096 messages get pushed on the row-read-stage queue (or any other
> multiThreadedStage) the deserializer basically becomes a single-threaded
> blocking queue that prevents any other inter-node RPC from occurring..?
> Sounds like it's a problem either way. If the row read stage is what's
> backed up, why not have the messages stack up on that stage?
> Mike
>
> On Wed, Aug 4, 2010 at 11:46 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> No, MDP is backing up because Row-Read-Stage [the stage after MDP on
>> reads] is full at 4096, meaning you're not able to process reads as
>> quickly as the requests are coming in.
>>
>> On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone <mi...@simplegeo.com> wrote:
>> > This may be your
>> > problem: https://issues.apache.org/jira/browse/CASSANDRA-1358
>> > The message deserializer executor is being created with a core pool size
>> > of
>> > 1. Since it uses a queue with unbounded capacity new requests are always
>> > queued and the thread pool never grows. So the message deserializer
>> > becomes
>> > a single-threaded bottleneck through which all traffic must pass. So
>> > your 16
>> > cores are reduced to one core for handling all inter-node communication
>> > (and
>> > any intra-node communication that's being passed through the messaging
>> > service).
>> > Mike
>> >
>> > On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall <da...@gmail.com>
>> > wrote:
>> >>
>> >> The output of htop shows threads as procs with a breakdown of how much
>> >> cpu
>> >> /etc per thread (in ncurses color!). All of these Java "procs" are just
>> >> Java
>> >> threads of only 1 instance of Cassandra per Server.
>> >>
>> >> On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>>
>> >>> Sorry, I just noticed: are you running 14 instances of Cassandra on a
>> >>> single physical machine or are all those java processes something
>> >>> else?
>> >>>
>> >>> On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall
>> >>> <da...@gmail.com>
>> >>> wrote:
>> >>> > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
>> >>> > cores,
>> >>> > RAID Disks).
>> >>> >
>> >>> > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
>> >>> > Pool Name Active Pending Completed
>> >>> > STREAM-STAGE 0 0 0
>> >>> > RESPONSE-STAGE 0 0 516280
>> >>> > ROW-READ-STAGE 8 4096 1164326
>> >>> > LB-OPERATIONS 0 0 0
>> >>> > MESSAGE-DESERIALIZER-POOL 1 682008 1818682
>> >>> > GMFD 0 0 6467
>> >>> > LB-TARGET 0 0 0
>> >>> > CONSISTENCY-MANAGER 0 0 661477
>> >>> > ROW-MUTATION-STAGE 0 0 998780
>> >>> > MESSAGE-STREAMING-POOL 0 0 0
>> >>> > LOAD-BALANCER-STAGE 0 0 0
>> >>> > FLUSH-SORTER-POOL 0 0 0
>> >>> > MEMTABLE-POST-FLUSHER 0 0 4
>> >>> > FLUSH-WRITER-POOL 0 0 4
>> >>> > AE-SERVICE-STAGE 0 0 0
>> >>> > HINTED-HANDOFF-POOL 0 0 3
>> >>> >
>> >>> > EQX root@cass04:~# vmstat -n 1
>> >>> >
>> >>> > procs -----------memory---------- ---swap-- -----io---- --system--
>> >>> > -----cpu------
>> >>> > r b swpd free buff cache si so bi bo in cs us
>> >>> > sy
>> >>> > id
>> >>> > wa st
>> >>> > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0
>> >>> > 5
>> >>> > 1
>> >>> > 94 0 0
>> >>> > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820
>> >>> > 2 1
>> >>> > 79 18 0
>> >>> > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878
>> >>> > 2 1
>> >>> > 81 16 0
>> >>> > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753
>> >>> > 2 1
>> >>> > 80 17 0
>> >>> > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461
>> >>> > 2 1
>> >>> > 83 14 0
>> >>> > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511
>> >>> > 3 1
>> >>> > 77 20 0
>> >>> > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981
>> >>> > 2 1
>> >>> > 75 22 0
>> >>> > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436
>> >>> > 2 1
>> >>> > 81 16 0
>> >>> > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505
>> >>> > 10225
>> >>> > 2 1
>> >>> > 77 19 0
>> >>> > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602
>> >>> > 2 1
>> >>> > 80 18 0
>> >>> > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158
>> >>> > 2 1
>> >>> > 79 18 0
>> >>> > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489
>> >>> > 1 1
>> >>> > 82 16 0
>> >>> > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459
>> >>> > 2 1
>> >>> > 84 13 0
>> >>> > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914
>> >>> > 10068
>> >>> > 2 1
>> >>> > 76 21 0
>> >>> >
>> >>> >
>> >>> > But the 16 cores are hardly utilized. Which indicates to me there is
>> >>> > some
>> >>> > bad thread thrashing, but why?
>> >>> >
>> >>> >
>> >>> >
>> >>> > 1 [||||| 8.3%]
>> >>> > Tasks:
>> >>> > 1070 total, 1 running
>> >>> > 2 [ 0.0%]
>> >>> > Load
>> >>> > average: 8.34 9.05 8.82
>> >>> > 3 [ 0.0%]
>> >>> > Uptime:
>> >>> > 192 days(!), 15:29:52
>> >>> > 4 [||||||||||| 17.9%]
>> >>> > 5 [||||| 5.7%]
>> >>> > 6 [|| 1.3%]
>> >>> > 7 [|| 2.6%]
>> >>> > 8 [| 0.6%]
>> >>> > 9 [| 0.6%]
>> >>> > 10 [|| 1.9%]
>> >>> > 11 [|| 1.9%]
>> >>> > 12 [|| 1.9%]
>> >>> > 13 [|| 1.3%]
>> >>> > 14 [| 0.6%]
>> >>> > 15 [|| 1.3%]
>> >>> > 16 [| 0.6%]
>> >>> > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
>> >>> > Swp[| 6/1983MB]
>> >>> >
>> >>> > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+
>> >>> > Command
>> >>> > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
>> >>> > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
>> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
>> >>> > -XX:+UseConcMark
>> >>> >
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Posted by Mike Malone <mi...@simplegeo.com>.
So after 4096 messages get pushed on the row-read-stage queue (or any other
multiThreadedStage) the deserializer basically becomes a single-threaded
blocking queue that prevents any other inter-node RPC from occurring..?
Sounds like it's a problem either way. If the row read stage is what's
backed up, why not have the messages stack up on that stage?
Mike
On Wed, Aug 4, 2010 at 11:46 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> No, MDP is backing up because Row-Read-Stage [the stage after MDP on
> reads] is full at 4096, meaning you're not able to process reads as
> quickly as the requests are coming in.
>
> On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone <mi...@simplegeo.com> wrote:
> > This may be your
> > problem: https://issues.apache.org/jira/browse/CASSANDRA-1358
> > The message deserializer executor is being created with a core pool size
> of
> > 1. Since it uses a queue with unbounded capacity new requests are always
> > queued and the thread pool never grows. So the message deserializer
> becomes
> > a single-threaded bottleneck through which all traffic must pass. So your
> 16
> > cores are reduced to one core for handling all inter-node communication
> (and
> > any intra-node communication that's being passed through the messaging
> > service).
> > Mike
> >
> > On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall <da...@gmail.com>
> > wrote:
> >>
> >> The output of htop shows threads as procs with a breakdown of how much
> cpu
> >> /etc per thread (in ncurses color!). All of these Java "procs" are just
> Java
> >> threads of only 1 instance of Cassandra per Server.
> >>
> >> On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black <b...@b3k.us> wrote:
> >>>
> >>> Sorry, I just noticed: are you running 14 instances of Cassandra on a
> >>> single physical machine or are all those java processes something
> >>> else?
> >>>
> >>> On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall <
> dathanvp@gmail.com>
> >>> wrote:
> >>> > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
> >>> > cores,
> >>> > RAID Disks).
> >>> >
> >>> > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
> >>> > Pool Name Active Pending Completed
> >>> > STREAM-STAGE 0 0 0
> >>> > RESPONSE-STAGE 0 0 516280
> >>> > ROW-READ-STAGE 8 4096 1164326
> >>> > LB-OPERATIONS 0 0 0
> >>> > MESSAGE-DESERIALIZER-POOL 1 682008 1818682
> >>> > GMFD 0 0 6467
> >>> > LB-TARGET 0 0 0
> >>> > CONSISTENCY-MANAGER 0 0 661477
> >>> > ROW-MUTATION-STAGE 0 0 998780
> >>> > MESSAGE-STREAMING-POOL 0 0 0
> >>> > LOAD-BALANCER-STAGE 0 0 0
> >>> > FLUSH-SORTER-POOL 0 0 0
> >>> > MEMTABLE-POST-FLUSHER 0 0 4
> >>> > FLUSH-WRITER-POOL 0 0 4
> >>> > AE-SERVICE-STAGE 0 0 0
> >>> > HINTED-HANDOFF-POOL 0 0 3
> >>> >
> >>> > EQX root@cass04:~# vmstat -n 1
> >>> >
> >>> > procs -----------memory---------- ---swap-- -----io---- --system--
> >>> > -----cpu------
> >>> > r b swpd free buff cache si so bi bo in cs us
> sy
> >>> > id
> >>> > wa st
> >>> > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0
> 5
> >>> > 1
> >>> > 94 0 0
> >>> > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820
> >>> > 2 1
> >>> > 79 18 0
> >>> > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878
> >>> > 2 1
> >>> > 81 16 0
> >>> > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753
> >>> > 2 1
> >>> > 80 17 0
> >>> > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461
> >>> > 2 1
> >>> > 83 14 0
> >>> > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511
> >>> > 3 1
> >>> > 77 20 0
> >>> > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981
> >>> > 2 1
> >>> > 75 22 0
> >>> > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436
> >>> > 2 1
> >>> > 81 16 0
> >>> > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225
> >>> > 2 1
> >>> > 77 19 0
> >>> > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602
> >>> > 2 1
> >>> > 80 18 0
> >>> > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158
> >>> > 2 1
> >>> > 79 18 0
> >>> > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489
> >>> > 1 1
> >>> > 82 16 0
> >>> > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459
> >>> > 2 1
> >>> > 84 13 0
> >>> > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068
> >>> > 2 1
> >>> > 76 21 0
> >>> >
> >>> >
> >>> > But the 16 cores are hardly utilized. Which indicates to me there is
> >>> > some
> >>> > bad thread thrashing, but why?
> >>> >
> >>> >
> >>> >
> >>> > 1 [||||| 8.3%]
> >>> > Tasks:
> >>> > 1070 total, 1 running
> >>> > 2 [ 0.0%]
> >>> > Load
> >>> > average: 8.34 9.05 8.82
> >>> > 3 [ 0.0%]
> >>> > Uptime:
> >>> > 192 days(!), 15:29:52
> >>> > 4 [||||||||||| 17.9%]
> >>> > 5 [||||| 5.7%]
> >>> > 6 [|| 1.3%]
> >>> > 7 [|| 2.6%]
> >>> > 8 [| 0.6%]
> >>> > 9 [| 0.6%]
> >>> > 10 [|| 1.9%]
> >>> > 11 [|| 1.9%]
> >>> > 12 [|| 1.9%]
> >>> > 13 [|| 1.3%]
> >>> > 14 [| 0.6%]
> >>> > 15 [|| 1.3%]
> >>> > 16 [| 0.6%]
> >>> > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
> >>> > Swp[| 6/1983MB]
> >>> >
> >>> > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
> >>> > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
> >>> > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
> >>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC
> -XX:+UseConcMark
> >>> >
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Posted by Jonathan Ellis <jb...@gmail.com>.
No, MDP is backing up because Row-Read-Stage [the stage after MDP on
reads] is full at 4096, meaning you're not able to process reads as
quickly as the requests are coming in.
On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone <mi...@simplegeo.com> wrote:
> This may be your
> problem: https://issues.apache.org/jira/browse/CASSANDRA-1358
> The message deserializer executor is being created with a core pool size of
> 1. Since it uses a queue with unbounded capacity new requests are always
> queued and the thread pool never grows. So the message deserializer becomes
> a single-threaded bottleneck through which all traffic must pass. So your 16
> cores are reduced to one core for handling all inter-node communication (and
> any intra-node communication that's being passed through the messaging
> service).
> Mike
>
> On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall <da...@gmail.com>
> wrote:
>>
>> The output of htop shows threads as procs with a breakdown of how much cpu
>> /etc per thread (in ncurses color!). All of these Java "procs" are just Java
>> threads of only 1 instance of Cassandra per Server.
>>
>> On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black <b...@b3k.us> wrote:
>>>
>>> Sorry, I just noticed: are you running 14 instances of Cassandra on a
>>> single physical machine or are all those java processes something
>>> else?
>>>
>>> On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall <da...@gmail.com>
>>> wrote:
>>> > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
>>> > cores,
>>> > RAID Disks).
>>> >
>>> > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
>>> > Pool Name Active Pending Completed
>>> > STREAM-STAGE 0 0 0
>>> > RESPONSE-STAGE 0 0 516280
>>> > ROW-READ-STAGE 8 4096 1164326
>>> > LB-OPERATIONS 0 0 0
>>> > MESSAGE-DESERIALIZER-POOL 1 682008 1818682
>>> > GMFD 0 0 6467
>>> > LB-TARGET 0 0 0
>>> > CONSISTENCY-MANAGER 0 0 661477
>>> > ROW-MUTATION-STAGE 0 0 998780
>>> > MESSAGE-STREAMING-POOL 0 0 0
>>> > LOAD-BALANCER-STAGE 0 0 0
>>> > FLUSH-SORTER-POOL 0 0 0
>>> > MEMTABLE-POST-FLUSHER 0 0 4
>>> > FLUSH-WRITER-POOL 0 0 4
>>> > AE-SERVICE-STAGE 0 0 0
>>> > HINTED-HANDOFF-POOL 0 0 3
>>> >
>>> > EQX root@cass04:~# vmstat -n 1
>>> >
>>> > procs -----------memory---------- ---swap-- -----io---- --system--
>>> > -----cpu------
>>> > r b swpd free buff cache si so bi bo in cs us sy
>>> > id
>>> > wa st
>>> > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5
>>> > 1
>>> > 94 0 0
>>> > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820
>>> > 2 1
>>> > 79 18 0
>>> > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878
>>> > 2 1
>>> > 81 16 0
>>> > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753
>>> > 2 1
>>> > 80 17 0
>>> > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461
>>> > 2 1
>>> > 83 14 0
>>> > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511
>>> > 3 1
>>> > 77 20 0
>>> > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981
>>> > 2 1
>>> > 75 22 0
>>> > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436
>>> > 2 1
>>> > 81 16 0
>>> > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225
>>> > 2 1
>>> > 77 19 0
>>> > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602
>>> > 2 1
>>> > 80 18 0
>>> > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158
>>> > 2 1
>>> > 79 18 0
>>> > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489
>>> > 1 1
>>> > 82 16 0
>>> > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459
>>> > 2 1
>>> > 84 13 0
>>> > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068
>>> > 2 1
>>> > 76 21 0
>>> >
>>> >
>>> > But the 16 cores are hardly utilized. Which indicates to me there is
>>> > some
>>> > bad thread thrashing, but why?
>>> >
>>> >
>>> >
>>> > 1 [||||| 8.3%]
>>> > Tasks:
>>> > 1070 total, 1 running
>>> > 2 [ 0.0%]
>>> > Load
>>> > average: 8.34 9.05 8.82
>>> > 3 [ 0.0%]
>>> > Uptime:
>>> > 192 days(!), 15:29:52
>>> > 4 [||||||||||| 17.9%]
>>> > 5 [||||| 5.7%]
>>> > 6 [|| 1.3%]
>>> > 7 [|| 2.6%]
>>> > 8 [| 0.6%]
>>> > 9 [| 0.6%]
>>> > 10 [|| 1.9%]
>>> > 11 [|| 1.9%]
>>> > 12 [|| 1.9%]
>>> > 13 [|| 1.3%]
>>> > 14 [| 0.6%]
>>> > 15 [|| 1.3%]
>>> > 16 [| 0.6%]
>>> > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
>>> > Swp[| 6/1983MB]
>>> >
>>> > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
>>> > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
>>> > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
>>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>>> >
>>
>
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Posted by Mike Malone <mi...@simplegeo.com>.
This may be your problem:
https://issues.apache.org/jira/browse/CASSANDRA-1358
The message deserializer executor is being created with a core pool size of
1. Since it uses a queue with unbounded capacity new requests are always
queued and the thread pool never grows. So the message deserializer becomes
a single-threaded bottleneck through which all traffic must pass. So your 16
cores are reduced to one core for handling all inter-node communication (and
any intra-node communication that's being passed through the messaging
service).
Mike
On Tue, Aug 3, 2010 at 10:02 PM, Dathan Pattishall <da...@gmail.com>wrote:
> The output of htop shows threads as procs with a breakdown of how much cpu
> /etc per thread (in ncurses color!). All of these Java "procs" are just Java
> threads of only 1 instance of Cassandra per Server.
>
>
> On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black <b...@b3k.us> wrote:
>
>> Sorry, I just noticed: are you running 14 instances of Cassandra on a
>> single physical machine or are all those java processes something
>> else?
>>
>> On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall <da...@gmail.com>
>> wrote:
>> > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
>> cores,
>> > RAID Disks).
>> >
>> > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
>> > Pool Name Active Pending Completed
>> > STREAM-STAGE 0 0 0
>> > RESPONSE-STAGE 0 0 516280
>> > ROW-READ-STAGE 8 4096 1164326
>> > LB-OPERATIONS 0 0 0
>> > MESSAGE-DESERIALIZER-POOL 1 682008 1818682
>> > GMFD 0 0 6467
>> > LB-TARGET 0 0 0
>> > CONSISTENCY-MANAGER 0 0 661477
>> > ROW-MUTATION-STAGE 0 0 998780
>> > MESSAGE-STREAMING-POOL 0 0 0
>> > LOAD-BALANCER-STAGE 0 0 0
>> > FLUSH-SORTER-POOL 0 0 0
>> > MEMTABLE-POST-FLUSHER 0 0 4
>> > FLUSH-WRITER-POOL 0 0 4
>> > AE-SERVICE-STAGE 0 0 0
>> > HINTED-HANDOFF-POOL 0 0 3
>> >
>> > EQX root@cass04:~# vmstat -n 1
>> >
>> > procs -----------memory---------- ---swap-- -----io---- --system--
>> > -----cpu------
>> > r b swpd free buff cache si so bi bo in cs us sy
>> id
>> > wa st
>> > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5
>> 1
>> > 94 0 0
>> > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2
>> 1
>> > 79 18 0
>> > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2
>> 1
>> > 81 16 0
>> > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2
>> 1
>> > 80 17 0
>> > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2
>> 1
>> > 83 14 0
>> > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3
>> 1
>> > 77 20 0
>> > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2
>> 1
>> > 75 22 0
>> > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2
>> 1
>> > 81 16 0
>> > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225
>> 2 1
>> > 77 19 0
>> > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2
>> 1
>> > 80 18 0
>> > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2
>> 1
>> > 79 18 0
>> > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1
>> 1
>> > 82 16 0
>> > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2
>> 1
>> > 84 13 0
>> > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068
>> 2 1
>> > 76 21 0
>> >
>> >
>> > But the 16 cores are hardly utilized. Which indicates to me there is
>> some
>> > bad thread thrashing, but why?
>> >
>> >
>> >
>> > 1 [||||| 8.3%]
>> Tasks:
>> > 1070 total, 1 running
>> > 2 [ 0.0%] Load
>> > average: 8.34 9.05 8.82
>> > 3 [ 0.0%]
>> Uptime:
>> > 192 days(!), 15:29:52
>> > 4 [||||||||||| 17.9%]
>> > 5 [||||| 5.7%]
>> > 6 [|| 1.3%]
>> > 7 [|| 2.6%]
>> > 8 [| 0.6%]
>> > 9 [| 0.6%]
>> > 10 [|| 1.9%]
>> > 11 [|| 1.9%]
>> > 12 [|| 1.9%]
>> > 13 [|| 1.3%]
>> > 14 [| 0.6%]
>> > 15 [|| 1.3%]
>> > 16 [| 0.6%]
>> > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
>> > Swp[| 6/1983MB]
>> >
>> > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
>> > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
>> > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
>> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
>> >
>>
>
>
Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Posted by Dathan Pattishall <da...@gmail.com>.
The output of htop shows threads as procs with a breakdown of how much cpu
/etc per thread (in ncurses color!). All of these Java "procs" are just Java
threads of only 1 instance of Cassandra per Server.
On Sat, Jul 31, 2010 at 3:45 PM, Benjamin Black <b...@b3k.us> wrote:
> Sorry, I just noticed: are you running 14 instances of Cassandra on a
> single physical machine or are all those java processes something
> else?
>
> On Mon, Jul 26, 2010 at 12:22 PM, Dathan Pattishall <da...@gmail.com>
> wrote:
> > I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7
> cores,
> > RAID Disks).
> >
> > ~# /opt/cassandra/bin/nodetool --host=localhost --port=8181 tpstats
> > Pool Name Active Pending Completed
> > STREAM-STAGE 0 0 0
> > RESPONSE-STAGE 0 0 516280
> > ROW-READ-STAGE 8 4096 1164326
> > LB-OPERATIONS 0 0 0
> > MESSAGE-DESERIALIZER-POOL 1 682008 1818682
> > GMFD 0 0 6467
> > LB-TARGET 0 0 0
> > CONSISTENCY-MANAGER 0 0 661477
> > ROW-MUTATION-STAGE 0 0 998780
> > MESSAGE-STREAMING-POOL 0 0 0
> > LOAD-BALANCER-STAGE 0 0 0
> > FLUSH-SORTER-POOL 0 0 0
> > MEMTABLE-POST-FLUSHER 0 0 4
> > FLUSH-WRITER-POOL 0 0 4
> > AE-SERVICE-STAGE 0 0 0
> > HINTED-HANDOFF-POOL 0 0 3
> >
> > EQX root@cass04:~# vmstat -n 1
> >
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > -----cpu------
> > r b swpd free buff cache si so bi bo in cs us sy
> id
> > wa st
> > 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5 1
> > 94 0 0
> > 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2
> 1
> > 79 18 0
> > 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2
> 1
> > 81 16 0
> > 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2
> 1
> > 80 17 0
> > 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2
> 1
> > 83 14 0
> > 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3
> 1
> > 77 20 0
> > 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2
> 1
> > 75 22 0
> > 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2
> 1
> > 81 16 0
> > 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225 2
> 1
> > 77 19 0
> > 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2
> 1
> > 80 18 0
> > 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2
> 1
> > 79 18 0
> > 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1
> 1
> > 82 16 0
> > 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2
> 1
> > 84 13 0
> > 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068 2
> 1
> > 76 21 0
> >
> >
> > But the 16 cores are hardly utilized. Which indicates to me there is some
> > bad thread thrashing, but why?
> >
> >
> >
> > 1 [||||| 8.3%]
> Tasks:
> > 1070 total, 1 running
> > 2 [ 0.0%] Load
> > average: 8.34 9.05 8.82
> > 3 [ 0.0%]
> Uptime:
> > 192 days(!), 15:29:52
> > 4 [||||||||||| 17.9%]
> > 5 [||||| 5.7%]
> > 6 [|| 1.3%]
> > 7 [|| 2.6%]
> > 8 [| 0.6%]
> > 9 [| 0.6%]
> > 10 [|| 1.9%]
> > 11 [|| 1.9%]
> > 12 [|| 1.9%]
> > 13 [|| 1.3%]
> > 14 [| 0.6%]
> > 15 [|| 1.3%]
> > 16 [| 0.6%]
> > Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB]
> > Swp[| 6/1983MB]
> >
> > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
> > 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop
> > 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> > 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10
> > /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark
> >
>