You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@trafficserver.apache.org by Ziv Maor <zm...@cuppcomputing.com> on 2013/12/12 16:35:59 UTC

10% CPU usage on idle

Hi,

I'm using the ATS on an ARM based chip, and noticed that when I'm
running the server in IDLE state (i.e. no client is connected to it)
the CPU usage of the ATS is around 10%. Since I'm using an ARM chip
this is not a negligible amount of CPU usage.

After reviewing the code, it looks like the fault is in these 2 places:

- the timeout parameter supplied to the epoll_wait() function in
NetHandler::mainNetEvent(). currently timeout is set to 10
milliseconds.

- the timeout parameter supplied to the ink_cond_timedwait() function
in aio_thread_main(). again set to 10 milliseconds.

increasing the value of just the epoll_wait timeout from 10
milliseconds to 100 milliseconds, results in an excessive CPU usage of
the AIO thread (almost reaches 60%). Increasing the value of the
timeout of the AIO thread, again from 10 milliseconds to 100
milliseconds finally balanced the the thread's CPU usage and reaches
the desirable effect of 0% CPU usage

Has anyone encountered the same issue or would like to comment on
that? and if this is the correct way to fix this issue?

Thanks,
Ziv

Re: 10% CPU usage on idle

Posted by James Peach <jp...@apache.org>.

On Dec 15, 2013, at 3:26 PM, Reindl Harald <h....@thelounge.net> wrote:

> 
> 
> Am 16.12.2013 00:02, schrieb Uri Shachar:
>> On Thu, 12 Dec 2013 19:02:01 +0200 Ziv Maor wrote:
>>> 
>>> Can anyone evaluate the impact of changing the timeout values in these 2 places?
>>> 
>> ...
>>>>> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>>>>> 
>>>>> After reviewing the code, it looks like the fault is in these 2 places:
>>>>> 
>>>>> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>>>>> 
>>>>> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). again set to 10 milliseconds.
>>>>> 
>>>>> increasing the value of just the epoll_wait timeout from 10 milliseconds to 100 milliseconds, results in an excessive CPU usage of the AIO thread (almost reaches 60%). Increasing the value of the timeout of the AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
>>>>> 
>>>>> Has anyone encountered the same issue or would like to comment on that? and if this is the correct way to fix this issue?
>> 
>> It really depends on the hardware/workload/usage pattern - but if you adjust the timeouts like this you can expect a significantly higher average latency...
>> Did you try reducing the number of threads? If you have a light workload it could give you the same benefits without the latency impact. 
>> (records.config
>> CONFIG proxy.config.exec_thread.autoconfig INT 0CONFIG proxy.config.exec_thread.limit INT X
>> )
> 
> what make sme wonder is that ATS appears to be the only server software with such
> idle cpu-usage and i faced nearly any server type the last 10 years
> 
> *what* exactly is producing this CPU usage where any other software in idle goes
> down to zero and allow the CPU go in sleep mode?

There's 2 levels of event loop polling. At a scheduled interval, each thread drains the event loop,  polling until theres's no more events. I don't have any objection to changing this, but it's a fair amount of work to demonstrate the impact on a variety of different workloads. If there's a volunteer who wants to work on it, I might be able to lend a hand on the testing.

J

Re: 10% CPU usage on idle

Posted by Reindl Harald <h....@thelounge.net>.


Am 16.12.2013 00:02, schrieb Uri Shachar:
> On Thu, 12 Dec 2013 19:02:01 +0200 Ziv Maor wrote:
>>
>> Can anyone evaluate the impact of changing the timeout values in these 2 places?
>>
> ...
>>>> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>>>>
>>>> After reviewing the code, it looks like the fault is in these 2 places:
>>>>
>>>> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>>>>
>>>> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). again set to 10 milliseconds.
>>>>
>>>> increasing the value of just the epoll_wait timeout from 10 milliseconds to 100 milliseconds, results in an excessive CPU usage of the AIO thread (almost reaches 60%). Increasing the value of the timeout of the AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
>>>>
>>>> Has anyone encountered the same issue or would like to comment on that? and if this is the correct way to fix this issue?
> 
> It really depends on the hardware/workload/usage pattern - but if you adjust the timeouts like this you can expect a significantly higher average latency...
> Did you try reducing the number of threads? If you have a light workload it could give you the same benefits without the latency impact. 
> (records.config
> CONFIG proxy.config.exec_thread.autoconfig INT 0CONFIG proxy.config.exec_thread.limit INT X
> )

what make sme wonder is that ATS appears to be the only server software with such
idle cpu-usage and i faced nearly any server type the last 10 years

*what* exactly is producing this CPU usage where any other software in idle goes
down to zero and allow the CPU go in sleep mode?

Re: 10% CPU usage on idle

Posted by Ziv Maor <zm...@cuppcomputing.com>.

Well, yes I did. Reducing the number of threads, actually does reduces the
number of CPU cycles used and results in ~0.5% CPU usage per thread, which
is obviously wasn't enough. the only solution i could find was compiling
the ATS with native AIO  support and using the -t option set to 100, these
settings did the work for me and resulted in a near 0 CPU usage on IDLE
time.

Ziv



On Mon, Dec 16, 2013 at 1:02 AM, Uri Shachar <us...@hotmail.com> wrote:

> On Thu, 12 Dec 2013 19:02:01 +0200 Ziv Maor wrote:
> >
> > Can anyone evaluate the impact of changing the timeout values in these 2
> places?
> >
> ...
> >>> I'm using the ATS on an ARM based chip, and noticed that when I'm
> running the server in IDLE state (i.e. no client is connected to it) the
> CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not
> a negligible amount of CPU usage.
> >>>
> >>> After reviewing the code, it looks like the fault is in these 2 places:
> >>>
> >>> - the timeout parameter supplied to the epoll_wait() function in
> NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> >>>
> >>> - the timeout parameter supplied to the ink_cond_timedwait() function
> in aio_thread_main(). again set to 10 milliseconds.
> >>>
> >>> increasing the value of just the epoll_wait timeout from 10
> milliseconds to 100 milliseconds, results in an excessive CPU usage of the
> AIO thread (almost reaches 60%). Increasing the value of the timeout of the
> AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced
> the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
> >>>
> >>> Has anyone encountered the same issue or would like to comment on
> that? and if this is the correct way to fix this issue?
>
> It really depends on the hardware/workload/usage pattern - but if you
> adjust the timeouts like this you can expect a significantly higher average
> latency...
> Did you try reducing the number of threads? If you have a light workload
> it could give you the same benefits without the latency impact.
> (records.config
> CONFIG proxy.config.exec_thread.autoconfig INT 0CONFIG
> proxy.config.exec_thread.limit INT X
> )
>
>      --Uri

RE: 10% CPU usage on idle

Posted by Uri Shachar <us...@hotmail.com>.

On Thu, 12 Dec 2013 19:02:01 +0200 Ziv Maor wrote:
> 
> Can anyone evaluate the impact of changing the timeout values in these 2 places?
> 
...
>>> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>>> 
>>> After reviewing the code, it looks like the fault is in these 2 places:
>>> 
>>> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>>> 
>>> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). again set to 10 milliseconds.
>>> 
>>> increasing the value of just the epoll_wait timeout from 10 milliseconds to 100 milliseconds, results in an excessive CPU usage of the AIO thread (almost reaches 60%). Increasing the value of the timeout of the AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
>>> 
>>> Has anyone encountered the same issue or would like to comment on that? and if this is the correct way to fix this issue?

It really depends on the hardware/workload/usage pattern - but if you adjust the timeouts like this you can expect a significantly higher average latency...
Did you try reducing the number of threads? If you have a light workload it could give you the same benefits without the latency impact. 
(records.config
CONFIG proxy.config.exec_thread.autoconfig INT 0CONFIG proxy.config.exec_thread.limit INT X
)

     --Uri

Re: 10% CPU usage on idle

Posted by "Adam W. Dace" <co...@gmail.com>.

> It'd be really good to see what kind of impact this has on performance,
though, not just on doing nothing ;)

Well said.



On Fri, Dec 13, 2013 at 8:36 AM, Igor Galić <i....@brainsware.org> wrote:

> ------------------------------
>
> You probably should move this conversation over to Jira(sign up, find the
> appropriate issue, and provide a patch).
>
> If it means anything I thought I'd give this a try.  I changed both
> timeouts on the ATS v4.0.2 codebase.
> On Mac OS X my idle CPU usage went from 0.7% down to 0.2%.  On CentOS
> Linux VM my idle CPU usage
> went from about 1% to 0.3%.
>
>
> It'd be really good to see what kind of impact this has on performance,
> though, not just on doing nothing ;)
>
> Hope this helps.
>
> Regards,
>
>
>
> On Thu, Dec 12, 2013 at 11:02 AM, Ziv Maor <zm...@cuppcomputing.com>wrote:
>
>> Hi,
>>
>> Can anyone evaluate the impact of changing the timeout values in these 2
>> places?
>>
>> Thanks,
>>
>> Ziv
>>
>> Sent from my iPhone
>>
>> > On 12 Dec 2013, at 17:43, Reindl Harald <h....@thelounge.net> wrote:
>> >
>> > that's a long existing issue which is a show-stopper to let
>> > ATS run on build and testing-VM's because the VM eats
>> > 150 MHz CPU idle while any other guest goes down to 0
>> >
>> > there is a bugtracker-entry somewhere
>> >
>> > one may say 150 MHz is not much, on the other hand the
>> > backup-host with 12 replication slaves and a few WinXP
>> > test-machines eats around 400 MHz all the time
>> >
>> > Am 12.12.2013 16:35, schrieb Ziv Maor:
>> >> I'm using the ATS on an ARM based chip, and noticed that when I'm
>> running the server in IDLE state (i.e. no client is connected to it) the
>> CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not
>> a negligible amount of CPU usage.
>> >>
>> >> After reviewing the code, it looks like the fault is in these 2 places:
>> >>
>> >> - the timeout parameter supplied to the epoll_wait() function in
>> NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>> >>
>> >> - the timeout parameter supplied to the ink_cond_timedwait() function
>> in aio_thread_main(). again set to 10 milliseconds.
>> >>
>> >> increasing the value of just the epoll_wait timeout from 10
>> milliseconds to 100 milliseconds, results in an excessive CPU usage of the
>> AIO thread (almost reaches 60%). Increasing the value of the timeout of the
>> AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced
>> the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
>> >>
>> >> Has anyone encountered the same issue or would like to comment on
>> that? and if this is the correct way to fix this issue?
>> >
>> >
>>
>
>
>
> --
> ____________________________________________________________
> Adam W. Dace <co...@gmail.com>
>
> Phone: (815) 355-5848
> Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
> Microsoft Messenger - colonelforbin74@live.com <ad...@turing.com>
>
> Google Profile: https://plus.google.com/u/0/109309036874332290399/about
>
>
>
>
> --
> Igor Galić
>
> Tel: +43 (0) 664 886 22 883
> Mail: i.galic@brainsware.org
> URL: http://brainsware.org/
> GPG: 8716 7A9F 989B ABD5 100F  4008 F266 55D6 2998 1641
>
>


-- 
____________________________________________________________
Adam W. Dace <co...@gmail.com>

Phone: (815) 355-5848
Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
Microsoft Messenger - colonelforbin74@live.com <ad...@turing.com>

Google Profile: https://plus.google.com/u/0/109309036874332290399/about

Re: 10% CPU usage on idle

Posted by Igor Galić <i....@brainsware.org>.

----- Original Message -----

> You probably should move this conversation over to Jira(sign up, find the
> appropriate issue, and provide a patch).

> If it means anything I thought I'd give this a try. I changed both timeouts
> on the ATS v4.0.2 codebase.
> On Mac OS X my idle CPU usage went from 0.7% down to 0.2%. On CentOS Linux VM
> my idle CPU usage
> went from about 1% to 0.3%.

It'd be really good to see what kind of impact this has on performance, though, not just on doing nothing ;) 

> Hope this helps.

> Regards,

> On Thu, Dec 12, 2013 at 11:02 AM, Ziv Maor < zmaor@cuppcomputing.com > wrote:

> > Hi,
> 

> > Can anyone evaluate the impact of changing the timeout values in these 2
> > places?
> 

> > Thanks,
> 

> > Ziv
> 

> > Sent from my iPhone
> 

> > > On 12 Dec 2013, at 17:43, Reindl Harald < h.reindl@thelounge.net > wrote:
> 
> > >
> 
> > > that's a long existing issue which is a show-stopper to let
> 
> > > ATS run on build and testing-VM's because the VM eats
> 
> > > 150 MHz CPU idle while any other guest goes down to 0
> 
> > >
> 
> > > there is a bugtracker-entry somewhere
> 
> > >
> 
> > > one may say 150 MHz is not much, on the other hand the
> 
> > > backup-host with 12 replication slaves and a few WinXP
> 
> > > test-machines eats around 400 MHz all the time
> 
> > >
> 
> > > Am 12.12.2013 16:35, schrieb Ziv Maor:
> 
> > >> I'm using the ATS on an ARM based chip, and noticed that when I'm
> > >> running
> > >> the server in IDLE state (i.e. no client is connected to it) the CPU
> > >> usage of the ATS is around 10%. Since I'm using an ARM chip this is not
> > >> a
> > >> negligible amount of CPU usage.
> 
> > >>
> 
> > >> After reviewing the code, it looks like the fault is in these 2 places:
> 
> > >>
> 
> > >> - the timeout parameter supplied to the epoll_wait() function in
> > >> NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> 
> > >>
> 
> > >> - the timeout parameter supplied to the ink_cond_timedwait() function in
> > >> aio_thread_main(). again set to 10 milliseconds.
> 
> > >>
> 
> > >> increasing the value of just the epoll_wait timeout from 10 milliseconds
> > >> to 100 milliseconds, results in an excessive CPU usage of the AIO thread
> > >> (almost reaches 60%). Increasing the value of the timeout of the AIO
> > >> thread, again from 10 milliseconds to 100 milliseconds finally balanced
> > >> the the thread's CPU usage and reaches the desirable effect of 0% CPU
> > >> usage
> 
> > >>
> 
> > >> Has anyone encountered the same issue or would like to comment on that?
> > >> and if this is the correct way to fix this issue?
> 
> > >
> 
> > >
> 

> --
> ____________________________________________________________
> Adam W. Dace < colonelforbin74@gmail.com >

> Phone: (815) 355-5848
> Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
> Microsoft Messenger - colonelforbin74@live.com

> Google Profile: https://plus.google.com/u/0/109309036874332290399/about

-- 
Igor Galić 

Tel: +43 (0) 664 886 22 883 
Mail: i.galic@brainsware.org 
URL: http://brainsware.org/ 
GPG: 8716 7A9F 989B ABD5 100F 4008 F266 55D6 2998 1641

Re: 10% CPU usage on idle

Posted by "Adam W. Dace" <co...@gmail.com>.

You probably should move this conversation over to Jira(sign up, find the
appropriate issue, and provide a patch).

If it means anything I thought I'd give this a try.  I changed both
timeouts on the ATS v4.0.2 codebase.
On Mac OS X my idle CPU usage went from 0.7% down to 0.2%.  On CentOS Linux
VM my idle CPU usage
went from about 1% to 0.3%.

Hope this helps.

Regards,



On Thu, Dec 12, 2013 at 11:02 AM, Ziv Maor <zm...@cuppcomputing.com> wrote:

> Hi,
>
> Can anyone evaluate the impact of changing the timeout values in these 2
> places?
>
> Thanks,
>
> Ziv
>
> Sent from my iPhone
>
> > On 12 Dec 2013, at 17:43, Reindl Harald <h....@thelounge.net> wrote:
> >
> > that's a long existing issue which is a show-stopper to let
> > ATS run on build and testing-VM's because the VM eats
> > 150 MHz CPU idle while any other guest goes down to 0
> >
> > there is a bugtracker-entry somewhere
> >
> > one may say 150 MHz is not much, on the other hand the
> > backup-host with 12 replication slaves and a few WinXP
> > test-machines eats around 400 MHz all the time
> >
> > Am 12.12.2013 16:35, schrieb Ziv Maor:
> >> I'm using the ATS on an ARM based chip, and noticed that when I'm
> running the server in IDLE state (i.e. no client is connected to it) the
> CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not
> a negligible amount of CPU usage.
> >>
> >> After reviewing the code, it looks like the fault is in these 2 places:
> >>
> >> - the timeout parameter supplied to the epoll_wait() function in
> NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> >>
> >> - the timeout parameter supplied to the ink_cond_timedwait() function
> in aio_thread_main(). again set to 10 milliseconds.
> >>
> >> increasing the value of just the epoll_wait timeout from 10
> milliseconds to 100 milliseconds, results in an excessive CPU usage of the
> AIO thread (almost reaches 60%). Increasing the value of the timeout of the
> AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced
> the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
> >>
> >> Has anyone encountered the same issue or would like to comment on that?
> and if this is the correct way to fix this issue?
> >
> >
>



-- 
____________________________________________________________
Adam W. Dace <co...@gmail.com>

Phone: (815) 355-5848
Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
Microsoft Messenger - colonelforbin74@live.com <ad...@turing.com>

Google Profile: https://plus.google.com/u/0/109309036874332290399/about

Re: 10% CPU usage on idle

Posted by Ziv Maor <zm...@cuppcomputing.com>.

Hi,

Can anyone evaluate the impact of changing the timeout values in these 2 places?

Thanks,

Ziv

Sent from my iPhone

> On 12 Dec 2013, at 17:43, Reindl Harald <h....@thelounge.net> wrote:
> 
> that's a long existing issue which is a show-stopper to let
> ATS run on build and testing-VM's because the VM eats
> 150 MHz CPU idle while any other guest goes down to 0
> 
> there is a bugtracker-entry somewhere
> 
> one may say 150 MHz is not much, on the other hand the
> backup-host with 12 replication slaves and a few WinXP
> test-machines eats around 400 MHz all the time
> 
> Am 12.12.2013 16:35, schrieb Ziv Maor:
>> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>> 
>> After reviewing the code, it looks like the fault is in these 2 places:
>> 
>> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>> 
>> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). again set to 10 milliseconds.
>> 
>> increasing the value of just the epoll_wait timeout from 10 milliseconds to 100 milliseconds, results in an excessive CPU usage of the AIO thread (almost reaches 60%). Increasing the value of the timeout of the AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
>> 
>> Has anyone encountered the same issue or would like to comment on that? and if this is the correct way to fix this issue?
> 
>

Re: 10% CPU usage on idle

Posted by Reindl Harald <h....@thelounge.net>.

that's a long existing issue which is a show-stopper to let
ATS run on build and testing-VM's because the VM eats
150 MHz CPU idle while any other guest goes down to 0

there is a bugtracker-entry somewhere

one may say 150 MHz is not much, on the other hand the
backup-host with 12 replication slaves and a few WinXP
test-machines eats around 400 MHz all the time

Am 12.12.2013 16:35, schrieb Ziv Maor:
> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
> 
> After reviewing the code, it looks like the fault is in these 2 places:
> 
> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> 
> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). again set to 10 milliseconds.
> 
> increasing the value of just the epoll_wait timeout from 10 milliseconds to 100 milliseconds, results in an excessive CPU usage of the AIO thread (almost reaches 60%). Increasing the value of the timeout of the AIO thread, again from 10 milliseconds to 100 milliseconds finally balanced the the thread's CPU usage and reaches the desirable effect of 0% CPU usage
> 
> Has anyone encountered the same issue or would like to comment on that? and if this is the correct way to fix this issue?

Re: 10% CPU usage on idle

Posted by Igor Galić <i....@brainsware.org>.

It's documented here 
https://issues.apache.org/jira/browse/TS-1336 
and here 
https://issues.apache.org/jira/browse/TS-1365 

Would be /really/ nice to get a fix for that. 

----- Original Message -----

> Hi,

> I'm using the ATS on an ARM based chip, and noticed that when I'm running the
> server in IDLE state (i.e. no client is connected to it) the CPU usage of
> the ATS is around 10%. Since I'm using an ARM chip this is not a negligible
> amount of CPU usage.
> After reviewing the code, it looks like the fault is in these 2 places:
> - the timeout parameter supplied to the epoll_wait() function in
> NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> - the timeout parameter supplied to the ink_cond_timedwait() function in
> aio_thread_main(). again set to 10 milliseconds.
> increasing the value of just the epoll_wait timeout from 10 milliseconds to
> 100 milliseconds, results in an excessive CPU usage of the AIO thread
> (almost reaches 60%). Increasing the value of the timeout of the AIO thread,
> again from 10 milliseconds to 100 milliseconds finally balanced the the
> thread's CPU usage and reaches the desirable effect of 0% CPU usage
> Has anyone encountered the same issue or would like to comment on that? and
> if this is the correct way to fix this issue?
> Thanks,
> Ziv

-- 
Igor Galić 

Tel: +43 (0) 664 886 22 883 
Mail: i.galic@brainsware.org 
URL: http://brainsware.org/ 
GPG: 8716 7A9F 989B ABD5 100F 4008 F266 55D6 2998 1641

Re: 10% CPU usage on idle

Posted by "Adam W. Dace" <co...@gmail.com>.

Leif wrote:
> I’ve landed this on the current master. Take it out for a spin, and see
how it works. The sooner the better, so we know it
> works for v4.2.0 :).

Following the instructions on the Wiki(building), I went ahead and did a
git clone today of what I'm guessing is the master you speak of.

I installed it on my Mac child cache and noticed a drop of CPU used when
idle from 0.6%
without the new setting to 0.1% with it enabled.

For right now, I added this in records.config:

CONFIG proxy.config.net.poll_timeout INT 150

Normally I wouldn't concern myself much with the CPU used when idle, but
leave it to Apple to design hardware where using 1 CPU core and the
graphics card at full tilt doesn't phase it...but too much 1-5% CPU usage
tends to cause a lockup.

Thanks again.

Regards,

Adam

On Fri, Jan 3, 2014 at 11:40 AM, Leif Hedstrom <zw...@apache.org> wrote:

>
> On Jan 2, 2014, at 11:55 AM, Adam W. Dace <co...@gmail.com>
> wrote:
>
> > Leif wrote:
> > > I didn't hear any objections or complaints on this, so I'll finish
> this patch, document the new configuration option, and
> > > add it to master. The defaults will be exactly the same behavior as
> today. I've take on both TS-1336 and TS-1365, both
> > > of which are addressed with the changes.
> >
> > Thanks a bunch for this, I'll be using it.
>
>
> I’ve landed this on the current master. Take it out for a spin, and see
> how it works. The sooner the better, so we know it works for v4.2.0 :).
>
> Cheers,
>
> — leif
>
>

-- 
____________________________________________________________
Adam W. Dace <co...@gmail.com>

Phone: (815) 355-5848
Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
Microsoft Messenger - colonelforbin74@live.com <ad...@turing.com>

Google Profile: https://plus.google.com/u/0/109309036874332290399/about

Re: 10% CPU usage on idle

Posted by Reindl Harald <h....@thelounge.net>.


Am 03.01.2014 18:40, schrieb Leif Hedstrom:
> On Jan 2, 2014, at 11:55 AM, Adam W. Dace <co...@gmail.com> wrote:
> 
>> Leif wrote:
>>> I didn't hear any objections or complaints on this, so I'll finish this patch, document the new configuration option, and 
>>> add it to master. The defaults will be exactly the same behavior as today. I've take on both TS-1336 and TS-1365, both 
>>> of which are addressed with the changes.
>>
>> Thanks a bunch for this, I'll be using it.
> 
> 
> I’ve landed this on the current master. Take it out for a spin, and see how it works. The sooner the better, so we know it works for v4.2.0 :).

can i have a tarball like "trafficserver-4.1.2.tar.bz2" to feed rpmbuild
i am happy to test this first on some testing-environment and finally
also in production since for now the ATS machine is more a weapon for
extreme load currently not exists

Re: 10% CPU usage on idle

Posted by Leif Hedstrom <zw...@apache.org>.

On Jan 2, 2014, at 11:55 AM, Adam W. Dace <co...@gmail.com> wrote:

> Leif wrote:
> > I didn't hear any objections or complaints on this, so I'll finish this patch, document the new configuration option, and 
> > add it to master. The defaults will be exactly the same behavior as today. I've take on both TS-1336 and TS-1365, both 
> > of which are addressed with the changes.
> 
> Thanks a bunch for this, I'll be using it.


I’ve landed this on the current master. Take it out for a spin, and see how it works. The sooner the better, so we know it works for v4.2.0 :).

Cheers,

— leif

Re: 10% CPU usage on idle

Posted by "Adam W. Dace" <co...@gmail.com>.

Leif wrote:
> I didn't hear any objections or complaints on this, so I'll finish this
patch, document the new configuration option, and
> add it to master. The defaults will be exactly the same behavior as
today. I've take on both TS-1336 and TS-1365, both
> of which are addressed with the changes.

Thanks a bunch for this, I'll be using it.

Regards,



On Fri, Dec 27, 2013 at 7:41 AM, Igor Galić <i....@brainsware.org> wrote:

> This new option needs documentation on how to scale it for what kind of
> workload.
>
> ------------------------------
>
> On Dec 19, 2013, at 10:45 AM, Leif Hedstrom <zw...@apache.org> wrote:
>
>
> On Dec 12, 2013, at 8:35 AM, Ziv Maor <zm...@cuppcomputing.com> wrote:
>
> Hi,
>
> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>
> After reviewing the code, it looks like the fault is in these 2 places:
>
> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>
> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). ag
>
>
> Thanks for this detailed investigation! This actually does make a lot of
> sense. How do you feel about the patch below? It does the following:
>
> 1) Standardize the relevant timeouts to consistently
> use net_config_poll_timeout, as provided with the -t option. In particular,
> it makes sure we use the same timer for both the epoll_wait, and the
> AIO ink_cond_timedwait(). This does indeed avoid the excessive CPU usage
> when using -t.
>
> 2) It additional adds a new configuration option, that can be used instead
> of the -t option. It’s named proxy.config.net.poll_timeout.
>
>
>
> I didn't hear any objections or complaints on this, so I'll finish this
> patch, document the new configuration option, and add it to master. The
> defaults will be exactly the same behavior as today. I've take on both
> TS-1336 and TS-1365, both of which are addressed with the changes.
>
> Cheers,
>
> -- Leif
>
>
>
> Setting this to e.g. 100ms seems fine. It can increase latencies in some
> cases, so we should leave the default at 10ms.
>
> Thoughts?
>
> — leif
>
> diff --git a/iocore/aio/AIO.cc <http://aio.cc/> b/iocore/aio/AIO.cc<http://aio.cc/>
> index 2e3e91f..ac8ea0f 100644
> --- a/iocore/aio/AIO.cc <http://aio.cc/>
> +++ b/iocore/aio/AIO.cc <http://aio.cc/>
> @@ -538,9 +538,8 @@ aio_thread_main(void *arg)
>          op->thread->schedule_imm_signal(op);
>        ink_mutex_acquire(&my_aio_req->aio_mutex);
>      } while (1);
> -    timespec ten_msec_timespec =
> ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(10));
> -    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> -                       &ten_msec_timespec);
> +    timespec timedwait_msec =
> ink_based_hrtime_to_timespec(ink_get_hrtime() +
> HRTIME_MSECONDS(net_config_poll_timeout));
> +    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> &timedwait_msec);
>    }
>    return 0;
>  }
> diff --git a/iocore/eventsystem/I_SocketManager.h
> b/iocore/eventsystem/I_SocketManager.h
> index 57802f4..d294a5c 100644
> --- a/iocore/eventsystem/I_SocketManager.h
> +++ b/iocore/eventsystem/I_SocketManager.h
> @@ -40,6 +40,7 @@
>  #define DEFAULT_OPEN_MODE                         0644
>
>  class Thread;
> +extern int net_config_poll_timeout;
>
>  #define SOCKET int
>
> @@ -85,7 +86,7 @@ struct SocketManager
>    int epoll_create(int size);
>    int epoll_close(int eps);
>    int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
> -  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int
> timeout);
> +  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int
> timeout = net_config_poll_timeout);
>  #endif
>  #if TS_USE_KQUEUE
>    int kqueue();
> diff --git a/mgmt/RecordsConfig.cc <http://recordsconfig.cc/> b/mgmt/
> RecordsConfig.cc <http://recordsconfig.cc/>
> index 134e029..cfbcf2d 100644
> --- a/mgmt/RecordsConfig.cc <http://recordsconfig.cc/>
> +++ b/mgmt/RecordsConfig.cc <http://recordsconfig.cc/>
> @@ -796,6 +796,8 @@ RecordElement RecordsConfig[] = {
>    ,
>    {RECT_CONFIG, "proxy.config.net.sock_mss_in", RECD_INT, "0", RECU_NULL,
> RR_NULL, RECC_NULL, NULL, RECA_NULL}
>    ,
> +  {RECT_CONFIG, "proxy.config.net.poll_timeout", RECD_INT, "-1",
> RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
> +  ,
>
>
>  //##############################################################################
>    //#
> diff --git a/proxy/Main.cc <http://main.cc/> b/proxy/Main.cc<http://main.cc/>
> index 114ade8..64a1357 100644
> --- a/proxy/Main.cc <http://main.cc/>
> +++ b/proxy/Main.cc <http://main.cc/>
> @@ -1451,6 +1451,15 @@ main(int /* argc ATS_UNUSED */, char **argv)
>    size_t stacksize;
>    REC_ReadConfigInteger(stacksize,
> "proxy.config.thread.default.stacksize");
>
> +  // This has special semantics, -1 means use system defaults *or*
> whatever is provided from command line (-t).
> +  // This is necessary to preserve backwards compatibility. A value of 0
> is probably always undesirable.
> +  int conf_poll_timeout;
> +  REC_ReadConfigInteger(conf_poll_timeout,
> "proxy.config.net.poll_timeout");
> +
> +  if (conf_poll_timeout >= 0) {
> +    net_config_poll_timeout = conf_poll_timeout;
> +  }
> +
>    ink_event_system_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_net_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_aio_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>
>
>
>
>
>
> --
> Igor Galić
>
> Tel: +43 (0) 664 886 22 883
> Mail: i.galic@brainsware.org
> URL: http://brainsware.org/
> GPG: 8716 7A9F 989B ABD5 100F  4008 F266 55D6 2998 1641
>
>


-- 
____________________________________________________________
Adam W. Dace <co...@gmail.com>

Phone: (815) 355-5848
Instant Messenger: AIM & Yahoo! IM - colonelforbin74 | ICQ - #39374451
Microsoft Messenger - colonelforbin74@live.com <ad...@turing.com>

Google Profile: https://plus.google.com/u/0/109309036874332290399/about

Re: 10% CPU usage on idle

Posted by Leif Hedstrom <zw...@apache.org>.

On Dec 27, 2013, at 6:41 AM, Igor Galić <i....@brainsware.org> wrote:
> 
> This new option needs documentation on how to scale it for what kind of workload.

There is really no workload you want to scale this for. The only purpose to touch it is to avoid idle CPU usage, on eg a VM or shared system. This is how I intended to document this.

-- Leif 

> 
> On Dec 19, 2013, at 10:45 AM, Leif Hedstrom <zw...@apache.org> wrote:
> 
> 
> On Dec 12, 2013, at 8:35 AM, Ziv Maor <zm...@cuppcomputing.com> wrote:
> 
> Hi,
> 
> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
> 
> After reviewing the code, it looks like the fault is in these 2 places:
> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). ag
> 
> Thanks for this detailed investigation! This actually does make a lot of sense. How do you feel about the patch below? It does the following:
> 
> 1) Standardize the relevant timeouts to consistently use net_config_poll_timeout, as provided with the -t option. In particular, it makes sure we use the same timer for both the epoll_wait, and the AIO ink_cond_timedwait(). This does indeed avoid the excessive CPU usage when using -t.
> 
> 2) It additional adds a new configuration option, that can be used instead of the -t option. It’s named proxy.config.net.poll_timeout.
> 
> 
> I didn't hear any objections or complaints on this, so I'll finish this patch, document the new configuration option, and add it to master. The defaults will be exactly the same behavior as today. I've take on both TS-1336 and TS-1365, both of which are addressed with the changes.
> 
> Cheers,
> 
> -- Leif
> 
> 
> 
> Setting this to e.g. 100ms seems fine. It can increase latencies in some cases, so we should leave the default at 10ms.
> 
> Thoughts?
> 
> — leif
> 
> diff --git a/iocore/aio/AIO.cc b/iocore/aio/AIO.cc
> index 2e3e91f..ac8ea0f 100644
> --- a/iocore/aio/AIO.cc
> +++ b/iocore/aio/AIO.cc
> @@ -538,9 +538,8 @@ aio_thread_main(void *arg)
>          op->thread->schedule_imm_signal(op);
>        ink_mutex_acquire(&my_aio_req->aio_mutex);
>      } while (1);
> -    timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(10));
> -    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> -                       &ten_msec_timespec);
> +    timespec timedwait_msec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(net_config_poll_timeout));
> +    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex, &timedwait_msec);
>    }
>    return 0;
>  }
> diff --git a/iocore/eventsystem/I_SocketManager.h b/iocore/eventsystem/I_SocketManager.h
> index 57802f4..d294a5c 100644
> --- a/iocore/eventsystem/I_SocketManager.h
> +++ b/iocore/eventsystem/I_SocketManager.h
> @@ -40,6 +40,7 @@
>  #define DEFAULT_OPEN_MODE                         0644
>  
>  class Thread;
> +extern int net_config_poll_timeout;
>  
>  #define SOCKET int
>  
> @@ -85,7 +86,7 @@ struct SocketManager
>    int epoll_create(int size);
>    int epoll_close(int eps);
>    int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
> -  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
> +  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout = net_config_poll_timeout);
>  #endif
>  #if TS_USE_KQUEUE
>    int kqueue();
> diff --git a/mgmt/RecordsConfig.cc b/mgmt/RecordsConfig.cc
> index 134e029..cfbcf2d 100644
> --- a/mgmt/RecordsConfig.cc
> +++ b/mgmt/RecordsConfig.cc
> @@ -796,6 +796,8 @@ RecordElement RecordsConfig[] = {
>    ,
>    {RECT_CONFIG, "proxy.config.net.sock_mss_in", RECD_INT, "0", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
>    ,
> +  {RECT_CONFIG, "proxy.config.net.poll_timeout", RECD_INT, "-1", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
> +  ,
>  
>    //##############################################################################
>    //#
> diff --git a/proxy/Main.cc b/proxy/Main.cc
> index 114ade8..64a1357 100644
> --- a/proxy/Main.cc
> +++ b/proxy/Main.cc
> @@ -1451,6 +1451,15 @@ main(int /* argc ATS_UNUSED */, char **argv)
>    size_t stacksize;
>    REC_ReadConfigInteger(stacksize, "proxy.config.thread.default.stacksize");
>  
> +  // This has special semantics, -1 means use system defaults *or* whatever is provided from command line (-t).
> +  // This is necessary to preserve backwards compatibility. A value of 0 is probably always undesirable.
> +  int conf_poll_timeout;
> +  REC_ReadConfigInteger(conf_poll_timeout, "proxy.config.net.poll_timeout");
> +
> +  if (conf_poll_timeout >= 0) {
> +    net_config_poll_timeout = conf_poll_timeout;
> +  }
> +
>    ink_event_system_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_net_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_aio_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
> 
> 
> 
> 
> 
> 
> -- 
> Igor Galić
> 
> Tel: +43 (0) 664 886 22 883
> Mail: i.galic@brainsware.org
> URL: http://brainsware.org/
> GPG: 8716 7A9F 989B ABD5 100F  4008 F266 55D6 2998 1641
>

Re: 10% CPU usage on idle

Posted by Igor Galić <i....@brainsware.org>.

This new option needs documentation on how to scale it for what kind of workload. 

----- Original Message -----

> On Dec 19, 2013, at 10:45 AM, Leif Hedstrom < zwoop@apache.org > wrote:

> > On Dec 12, 2013, at 8:35 AM, Ziv Maor < zmaor@cuppcomputing.com > wrote:
> 

> > > Hi,
> > 
> 

> > > I'm using the ATS on an ARM based chip, and noticed that when I'm running
> > > the
> > > server in IDLE state (i.e. no client is connected to it) the CPU usage of
> > > the ATS is around 10%. Since I'm using an ARM chip this is not a
> > > negligible
> > > amount of CPU usage.
> > 
> 
> > > After reviewing the code, it looks like the fault is in these 2 places:
> > 
> 
> > > - the timeout parameter supplied to the epoll_wait() function in
> > > NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> > 
> 
> > > - the timeout parameter supplied to the ink_cond_timedwait() function in
> > > aio_thread_main(). ag
> > 
> 

> > Thanks for this detailed investigation! This actually does make a lot of
> > sense. How do you feel about the patch below? It does the following:
> 

> > 1) Standardize the relevant timeouts to consistently use
> > net_config_poll_timeout, as provided with the -t option. In particular, it
> > makes sure we use the same timer for both the epoll_wait, and the AIO
> > ink_cond_timedwait(). This does indeed avoid the excessive CPU usage when
> > using -t.
> 

> > 2) It additional adds a new configuration option, that can be used instead
> > of
> > the -t option. It’s named proxy.config.net.poll_timeout.
> 

> I didn't hear any objections or complaints on this, so I'll finish this
> patch, document the new configuration option, and add it to master. The
> defaults will be exactly the same behavior as today. I've take on both
> TS-1336 and TS-1365, both of which are addressed with the changes.

> Cheers,

> -- Leif

> > Setting this to e.g. 100ms seems fine. It can increase latencies in some
> > cases, so we should leave the default at 10ms.
> 

> > Thoughts?
> 

> > — leif
> 

> > diff --git a/iocore/aio/ AIO.cc b/iocore/aio/ AIO.cc
> 
> > index 2e3e91f..ac8ea0f 100644
> 
> > --- a/iocore/aio/ AIO.cc
> 
> > +++ b/iocore/aio/ AIO.cc
> 
> > @@ -538,9 +538,8 @@ aio_thread_main(void *arg)
> 
> > op->thread->schedule_imm_signal(op);
> 
> > ink_mutex_acquire(&my_aio_req->aio_mutex);
> 
> > } while (1);
> 
> > - timespec ten_msec_timespec =
> > ink_based_hrtime_to_timespec(ink_get_hrtime()
> > + HRTIME_MSECONDS(10));
> 
> > - ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> 
> > - &ten_msec_timespec);
> 
> > + timespec timedwait_msec = ink_based_hrtime_to_timespec(ink_get_hrtime() +
> > HRTIME_MSECONDS(net_config_poll_timeout));
> 
> > + ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> > &timedwait_msec);
> 
> > }
> 
> > return 0;
> 
> > }
> 
> > diff --git a/iocore/eventsystem/I_SocketManager.h
> > b/iocore/eventsystem/I_SocketManager.h
> 
> > index 57802f4..d294a5c 100644
> 
> > --- a/iocore/eventsystem/I_SocketManager.h
> 
> > +++ b/iocore/eventsystem/I_SocketManager.h
> 
> > @@ -40,6 +40,7 @@
> 
> > #define DEFAULT_OPEN_MODE 0644
> 
> > class Thread;
> 
> > +extern int net_config_poll_timeout;
> 
> > #define SOCKET int
> 
> > @@ -85,7 +86,7 @@ struct SocketManager
> 
> > int epoll_create(int size);
> 
> > int epoll_close(int eps);
> 
> > int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
> 
> > - int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int
> > timeout);
> 
> > + int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int
> > timeout = net_config_poll_timeout);
> 
> > #endif
> 
> > #if TS_USE_KQUEUE
> 
> > int kqueue();
> 
> > diff --git a/mgmt/ RecordsConfig.cc b/mgmt/ RecordsConfig.cc
> 
> > index 134e029..cfbcf2d 100644
> 
> > --- a/mgmt/ RecordsConfig.cc
> 
> > +++ b/mgmt/ RecordsConfig.cc
> 
> > @@ -796,6 +796,8 @@ RecordElement RecordsConfig[] = {
> 
> > ,
> 
> > {RECT_CONFIG, "proxy.config.net.sock_mss_in", RECD_INT, "0", RECU_NULL,
> > RR_NULL, RECC_NULL, NULL, RECA_NULL}
> 
> > ,
> 
> > + {RECT_CONFIG, "proxy.config.net.poll_timeout", RECD_INT, "-1", RECU_NULL,
> > RR_NULL, RECC_NULL, NULL, RECA_NULL}
> 
> > + ,
> 
> > //##############################################################################
> 
> > //#
> 
> > diff --git a/proxy/ Main.cc b/proxy/ Main.cc
> 
> > index 114ade8..64a1357 100644
> 
> > --- a/proxy/ Main.cc
> 
> > +++ b/proxy/ Main.cc
> 
> > @@ -1451,6 +1451,15 @@ main(int /* argc ATS_UNUSED */, char **argv)
> 
> > size_t stacksize;
> 
> > REC_ReadConfigInteger(stacksize, "proxy.config.thread.default.stacksize");
> 
> > + // This has special semantics, -1 means use system defaults *or* whatever
> > is provided from command line (-t).
> 
> > + // This is necessary to preserve backwards compatibility. A value of 0 is
> > probably always undesirable.
> 
> > + int conf_poll_timeout;
> 
> > + REC_ReadConfigInteger(conf_poll_timeout,
> > "proxy.config.net.poll_timeout");
> 
> > +
> 
> > + if (conf_poll_timeout >= 0) {
> 
> > + net_config_poll_timeout = conf_poll_timeout;
> 
> > + }
> 
> > +
> 
> > ink_event_system_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
> 
> > ink_net_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
> 
> > ink_aio_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
> 

-- 
Igor Galić 

Tel: +43 (0) 664 886 22 883 
Mail: i.galic@brainsware.org 
URL: http://brainsware.org/ 
GPG: 8716 7A9F 989B ABD5 100F 4008 F266 55D6 2998 1641

Re: 10% CPU usage on idle

Posted by Leif Hedstrom <zw...@apache.org>.

On Dec 19, 2013, at 10:45 AM, Leif Hedstrom <zw...@apache.org> wrote:

> 
> On Dec 12, 2013, at 8:35 AM, Ziv Maor <zm...@cuppcomputing.com> wrote:
> 
>> Hi,
>> 
>> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
>> 
>> After reviewing the code, it looks like the fault is in these 2 places:
>> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
>> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). ag
> 
> Thanks for this detailed investigation! This actually does make a lot of sense. How do you feel about the patch below? It does the following:
> 
> 1) Standardize the relevant timeouts to consistently use net_config_poll_timeout, as provided with the -t option. In particular, it makes sure we use the same timer for both the epoll_wait, and the AIO ink_cond_timedwait(). This does indeed avoid the excessive CPU usage when using -t.
> 
> 2) It additional adds a new configuration option, that can be used instead of the -t option. It’s named proxy.config.net.poll_timeout.


I didn't hear any objections or complaints on this, so I'll finish this patch, document the new configuration option, and add it to master. The defaults will be exactly the same behavior as today. I've take on both TS-1336 and TS-1365, both of which are addressed with the changes.

Cheers,

-- Leif

> 
> 
> Setting this to e.g. 100ms seems fine. It can increase latencies in some cases, so we should leave the default at 10ms.
> 
> Thoughts?
> 
> — leif
> 
> diff --git a/iocore/aio/AIO.cc b/iocore/aio/AIO.cc
> index 2e3e91f..ac8ea0f 100644
> --- a/iocore/aio/AIO.cc
> +++ b/iocore/aio/AIO.cc
> @@ -538,9 +538,8 @@ aio_thread_main(void *arg)
>          op->thread->schedule_imm_signal(op);
>        ink_mutex_acquire(&my_aio_req->aio_mutex);
>      } while (1);
> -    timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(10));
> -    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
> -                       &ten_msec_timespec);
> +    timespec timedwait_msec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(net_config_poll_timeout));
> +    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex, &timedwait_msec);
>    }
>    return 0;
>  }
> diff --git a/iocore/eventsystem/I_SocketManager.h b/iocore/eventsystem/I_SocketManager.h
> index 57802f4..d294a5c 100644
> --- a/iocore/eventsystem/I_SocketManager.h
> +++ b/iocore/eventsystem/I_SocketManager.h
> @@ -40,6 +40,7 @@
>  #define DEFAULT_OPEN_MODE                         0644
>  
>  class Thread;
> +extern int net_config_poll_timeout;
>  
>  #define SOCKET int
>  
> @@ -85,7 +86,7 @@ struct SocketManager
>    int epoll_create(int size);
>    int epoll_close(int eps);
>    int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
> -  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
> +  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout = net_config_poll_timeout);
>  #endif
>  #if TS_USE_KQUEUE
>    int kqueue();
> diff --git a/mgmt/RecordsConfig.cc b/mgmt/RecordsConfig.cc
> index 134e029..cfbcf2d 100644
> --- a/mgmt/RecordsConfig.cc
> +++ b/mgmt/RecordsConfig.cc
> @@ -796,6 +796,8 @@ RecordElement RecordsConfig[] = {
>    ,
>    {RECT_CONFIG, "proxy.config.net.sock_mss_in", RECD_INT, "0", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
>    ,
> +  {RECT_CONFIG, "proxy.config.net.poll_timeout", RECD_INT, "-1", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
> +  ,
>  
>    //##############################################################################
>    //#
> diff --git a/proxy/Main.cc b/proxy/Main.cc
> index 114ade8..64a1357 100644
> --- a/proxy/Main.cc
> +++ b/proxy/Main.cc
> @@ -1451,6 +1451,15 @@ main(int /* argc ATS_UNUSED */, char **argv)
>    size_t stacksize;
>    REC_ReadConfigInteger(stacksize, "proxy.config.thread.default.stacksize");
>  
> +  // This has special semantics, -1 means use system defaults *or* whatever is provided from command line (-t).
> +  // This is necessary to preserve backwards compatibility. A value of 0 is probably always undesirable.
> +  int conf_poll_timeout;
> +  REC_ReadConfigInteger(conf_poll_timeout, "proxy.config.net.poll_timeout");
> +
> +  if (conf_poll_timeout >= 0) {
> +    net_config_poll_timeout = conf_poll_timeout;
> +  }
> +
>    ink_event_system_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_net_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
>    ink_aio_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
> 
>

Re: 10% CPU usage on idle

Posted by Leif Hedstrom <zw...@apache.org>.

On Dec 12, 2013, at 8:35 AM, Ziv Maor <zm...@cuppcomputing.com> wrote:

> Hi,
> 
> I'm using the ATS on an ARM based chip, and noticed that when I'm running the server in IDLE state (i.e. no client is connected to it) the CPU usage of the ATS is around 10%. Since I'm using an ARM chip this is not a negligible amount of CPU usage.
> 
> After reviewing the code, it looks like the fault is in these 2 places:
> - the timeout parameter supplied to the epoll_wait() function in NetHandler::mainNetEvent(). currently timeout is set to 10 milliseconds.
> - the timeout parameter supplied to the ink_cond_timedwait() function in aio_thread_main(). ag

Thanks for this detailed investigation! This actually does make a lot of sense. How do you feel about the patch below? It does the following:

1) Standardize the relevant timeouts to consistently use net_config_poll_timeout, as provided with the -t option. In particular, it makes sure we use the same timer for both the epoll_wait, and the AIO ink_cond_timedwait(). This does indeed avoid the excessive CPU usage when using -t.

2) It additional adds a new configuration option, that can be used instead of the -t option. It’s named proxy.config.net.poll_timeout.


Setting this to e.g. 100ms seems fine. It can increase latencies in some cases, so we should leave the default at 10ms.

Thoughts?

— leif

diff --git a/iocore/aio/AIO.cc b/iocore/aio/AIO.cc
index 2e3e91f..ac8ea0f 100644
--- a/iocore/aio/AIO.cc
+++ b/iocore/aio/AIO.cc
@@ -538,9 +538,8 @@ aio_thread_main(void *arg)
         op->thread->schedule_imm_signal(op);
       ink_mutex_acquire(&my_aio_req->aio_mutex);
     } while (1);
-    timespec ten_msec_timespec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(10));
-    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex,
-                       &ten_msec_timespec);
+    timespec timedwait_msec = ink_based_hrtime_to_timespec(ink_get_hrtime() + HRTIME_MSECONDS(net_config_poll_timeout));
+    ink_cond_timedwait(&my_aio_req->aio_cond, &my_aio_req->aio_mutex, &timedwait_msec);
   }
   return 0;
 }
diff --git a/iocore/eventsystem/I_SocketManager.h b/iocore/eventsystem/I_SocketManager.h
index 57802f4..d294a5c 100644
--- a/iocore/eventsystem/I_SocketManager.h
+++ b/iocore/eventsystem/I_SocketManager.h
@@ -40,6 +40,7 @@
 #define DEFAULT_OPEN_MODE                         0644
 
 class Thread;
+extern int net_config_poll_timeout;
 
 #define SOCKET int
 
@@ -85,7 +86,7 @@ struct SocketManager
   int epoll_create(int size);
   int epoll_close(int eps);
   int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
-  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
+  int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout = net_config_poll_timeout);
 #endif
 #if TS_USE_KQUEUE
   int kqueue();
diff --git a/mgmt/RecordsConfig.cc b/mgmt/RecordsConfig.cc
index 134e029..cfbcf2d 100644
--- a/mgmt/RecordsConfig.cc
+++ b/mgmt/RecordsConfig.cc
@@ -796,6 +796,8 @@ RecordElement RecordsConfig[] = {
   ,
   {RECT_CONFIG, "proxy.config.net.sock_mss_in", RECD_INT, "0", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
   ,
+  {RECT_CONFIG, "proxy.config.net.poll_timeout", RECD_INT, "-1", RECU_NULL, RR_NULL, RECC_NULL, NULL, RECA_NULL}
+  ,
 
   //##############################################################################
   //#
diff --git a/proxy/Main.cc b/proxy/Main.cc
index 114ade8..64a1357 100644
--- a/proxy/Main.cc
+++ b/proxy/Main.cc
@@ -1451,6 +1451,15 @@ main(int /* argc ATS_UNUSED */, char **argv)
   size_t stacksize;
   REC_ReadConfigInteger(stacksize, "proxy.config.thread.default.stacksize");
 
+  // This has special semantics, -1 means use system defaults *or* whatever is provided from command line (-t).
+  // This is necessary to preserve backwards compatibility. A value of 0 is probably always undesirable.
+  int conf_poll_timeout;
+  REC_ReadConfigInteger(conf_poll_timeout, "proxy.config.net.poll_timeout");
+
+  if (conf_poll_timeout >= 0) {
+    net_config_poll_timeout = conf_poll_timeout;
+  }
+
   ink_event_system_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
   ink_net_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));
   ink_aio_init(makeModuleVersion(1, 0, PRIVATE_MODULE_HEADER));