You are viewing a plain text version of this content. The canonical link for it is here.
Posted to olio-user@incubator.apache.org by Mingfan Lu <mi...@gmail.com> on 2009/09/25 02:25:21 UTC

Strange behavior when concurrent user from 9K to 10K

I using faban/oliophp to stress a machine (16 Core) as web server with two
other DB nodes ( a master_slave cluster, master using a high speed SATA disk
while slave using a SSD disk)

When #concurrent users scaling  from 9K 10K 11K 12K 13K 14K 15K 16K the
throughput increasing and then decreasing.
9k      *1810.323*
10K    *1969.393*
11K    *1859.053*
12K    *1842.368*
13K    *1849.213*
14K    *1843.955*
 It seems that there are some bottleneck here.

Detail to see the attached run.xml
My ramp time is 300s while steady time is 600s and the rampdown is 60s
The client start up:
    Time between starts (ms) :1
    Start simultaneously: No
    Start agents in parallel: No

But my profiling data shows that the CPU( Highest is about 80%~90% when
#concurrent user is 10000, softirq% is about 14% with *4tx and 4rx* queues )
/ Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the bottleneck.
The Apache error log is very clean with no exception and error. At the same
time I have disabled the static images serving (Just disable all *img* tag
in the HTML)

From the pictures in
http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true ,
when 9K concurrent user, the response time is steady enough, when 10K, there
is pulse lasting 600sec (what happen?) and down to very small enough in the
last 300sec.
I want to know what cause the strange pulse when concurrent users reach 10K?

Re: Strange behavior when concurrent user from 9K to 10K

Posted by Mingfan Lu <mi...@gmail.com>.
Thanks for your nice reply.
Sorry, It seems that the former mails have some exceptions to extract
content in the mail archieve so I resent the mail. I promise I won't do this
again.

It seems that my

On Sat, Sep 26, 2009 at 1:59 AM, Shanti Subramanyam - PAE <
Shanti.Subramanyam@sun.com> wrote:

> We have received 3-4 copies of this email over different times - not sure
> what's going on.
>
> Congratulations on reaching such a high scale in your testing. As you
> probably know by now that scaling isn't a straight-forward task and involves
> much analysis and tuning.
> If you're cpu is 90% utilized at 10000 users, I don't see how you can
> expect to get more throughput from this system ? In fact, you will probably
> find that beyond a certain point (say 6 to 8000 users), you will need more
> cpu/user as scalability drops.
> In any case, at such high rates, there can be lots of issues and it is
> difficult to preddict what exactly you may be hitting.
> Shanti
>
> On 09/24/09 19:25, Mingfan Lu wrote:
>
>> I using faban/oliophp to stress a machine (16 Core) as web server with two
>> other DB nodes ( a master_slave cluster, master using a high speed SATA disk
>> while slave using a SSD disk)
>>
>> When #concurrent users scaling  from 9K 10K 11K 12K 13K 14K 15K 16K the
>> throughput increasing and then decreasing.
>> 9k      *1810.323*
>> 10K    *1969.393*
>> 11K    *1859.053*
>> 12K    *1842.368*
>> 13K    *1849.213*
>> 14K    *1843.955*
>>
>>  It seems that there are some bottleneck here.
>>
>> Detail to see the attached run.xml
>>
>> My ramp time is 300s while steady time is 600s and the rampdown is 60s
>> The client start up:
>>    Time between starts (ms) :1
>>    Start simultaneously: No
>>    Start agents in parallel: No
>>
>> But my profiling data shows that the CPU( Highest is about 80%~90% when
>> #concurrent user is 10000, softirq% is about 14% with *4tx and 4rx* queues )
>> / Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the bottleneck.
>> The Apache error log is very clean with no exception and error. At the same
>> time I have disabled the static images serving (Just disable all *img* tag
>> in the HTML)
>>
>>  From the pictures in
>> http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true<
>> http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true>
>> , when 9K concurrent user, the response time is steady enough, when 10K,
>> there is pulse lasting 600sec (what happen?) and down to very small enough
>> in the last 300sec.
>> I want to know what cause the strange pulse when concurrent users reach
>> 10K?
>>
>

Re: Strange behavior when concurrent user from 9K to 10K

Posted by Shanti Subramanyam - PAE <Sh...@Sun.COM>.
We have received 3-4 copies of this email over different times - not 
sure what's going on.

Congratulations on reaching such a high scale in your testing. As you 
probably know by now that scaling isn't a straight-forward task and 
involves much analysis and tuning.
If you're cpu is 90% utilized at 10000 users, I don't see how you can 
expect to get more throughput from this system ? In fact, you will 
probably find that beyond a certain point (say 6 to 8000 users), you 
will need more cpu/user as scalability drops.
In any case, at such high rates, there can be lots of issues and it is 
difficult to preddict what exactly you may be hitting.
Shanti

On 09/24/09 19:25, Mingfan Lu wrote:
> I using faban/oliophp to stress a machine (16 Core) as web server with 
> two other DB nodes ( a master_slave cluster, master using a high speed 
> SATA disk while slave using a SSD disk)
> 
> When #concurrent users scaling  from 9K 10K 11K 12K 13K 14K 15K 16K the 
> throughput increasing and then decreasing.
> 9k      *1810.323*
> 10K    *1969.393*
> 11K    *1859.053*
> 12K    *1842.368*
> 13K    *1849.213*
> 14K    *1843.955*
> 
>  It seems that there are some bottleneck here.
> 
> Detail to see the attached run.xml
> 
> My ramp time is 300s while steady time is 600s and the rampdown is 60s
> The client start up:
>     Time between starts (ms) :1
>     Start simultaneously: No
>     Start agents in parallel: No
> 
> But my profiling data shows that the CPU( Highest is about 80%~90% when 
> #concurrent user is 10000, softirq% is about 14% with *4tx and 4rx* 
> queues ) / Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the 
> bottleneck. The Apache error log is very clean with no exception and 
> error. At the same time I have disabled the static images serving (Just 
> disable all *img* tag in the HTML)
> 
>  From the pictures in 
> http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true 
> <http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true> 
> , when 9K concurrent user, the response time is steady enough, when 10K, 
> there is pulse lasting 600sec (what happen?) and down to very small 
> enough in the last 300sec.
> I want to know what cause the strange pulse when concurrent users reach 
> 10K?