You are viewing a plain text version of this content. The canonical link for it is here.

Posted to olio-user@incubator.apache.org by Mingfan Lu <mi...@gmail.com> on 2009/09/24 09:03:34 UTC

Strange bottleneck after concurrent users reach 10K

I using oliophp to stress a machine (16Core) as web server with two other DB
nodes ( a master_slave cluster, master using a SATA disk while slave using a
SSD disk)
When #concurrent users scaling  from 9K 10K 11K 12K 13K 14K 15K 16K the
throughput increasing and then decreasing. It seems that there are some
bottleneck here.
*User
**Throughput(Ops)
* *14000* *1843.955*  *13000* *1849.213*  *12000* *1842.368*  *11000* *
1859.053*  *10000* *1969.393*  *9000* *1810.323* My ramp time is 300s while
steady time is 600s and the rampdown is 60s
The client start up:
    Time between starts (ms) :1
    Start simultaneously: No
    Start agents in parallel: No
See my attachment run.xml

But my profiling data shows that the CPU( Highest is about 80%~90% when
#concurrent user is 10000, softirq% is about 14% with *4tx and 4rx* queues )
/ Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the bottleneck.
The Apache error log is very clean with no exception and error. At the same
time I have disabled the static images serving (Just disable all *<img* tag
in the HTML)
From the pictures in
http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true ,
when 9K concurrent user, the response time is steady enough, when 10K, there
is pulse lasting 600sec (what happen?) and down to very small enough in the
last 300sec.
I want to know what cause the strange pulse when concurrent users reach 10K?

Strange bottleneck after concurrent users reach 10K

Posted by Mingfan Lu <mi...@gmail.com>.

I using oliophp to stress a machine (16Core) as web server with two other DB
nodes ( a master_slave cluster, master using a SATA disk while slave using a
SSD disk)
When #concurrent users scaling  from 9K 10K 11K 12K 13K 14K 15K 16K the
throughput increasing and then decreasing.
9k      *1810.323*
10K    *1969.393*
11K    *1859.053*
12K    *1842.368*
13K    *1849.213*
14K    *1843.955*

 It seems that there are some bottleneck here.

Detail to see the attached run.xml
My ramp time is 300s while steady time is 600s and the rampdown is 60s
The client start up:
    Time between starts (ms) :1
    Start simultaneously: No
    Start agents in parallel: No

But my profiling data shows that the CPU( Highest is about 80%~90% when
#concurrent user is 10000, softirq% is about 14% with *4tx and 4rx* queues )
/ Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the bottleneck.
The Apache error log is very clean with no exception and error. At the same
time I have disabled the static images serving (Just disable all *img* tag
in the HTML)

From the pictures in
http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true ,
when 9K concurrent user, the response time is steady enough, when 10K, there
is pulse lasting 600sec (what happen?) and down to very small enough in the
last 300sec.
I want to know what cause the strange pulse when concurrent users reach 10K?