You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by Daniel Biazus <da...@azion.com> on 2015/01/11 21:52:06 UTC

Interim cache - High CPU usage

Hi Guys,

We' ve been using ATS as a reverse proxy, and a few week ago We started to
use the interim cache feature in more intense way, caching objects with the
average size of 200 MB and max size of 1 GB.

*We have ~ 1 TB HDD as a default storage:*

cat /etc/trafficserver/storage.config

# ATS - Storage
/dev/sda6 volume=1

*And also, a 120 GB SDD storage as interim cache:*

LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1

After 20 ~ 30 minutes in production with this configuration, We could
notice a sudden CPU high usage, increasing up to 65 %, considering that our
regular usage is 10 %. However the throughput still stable in 250 Mbps per
box.

We've found the following behavior using the perf top tool:

   88.18%  traffic_server      [.]
_Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread
     0.32%  traffic_server      [.] _ZN10NetHandler12mainNetEventEiP5Event
     0.30%  [kernel]             [k] update_sd_lb_stats
     0.29%  [e1000e]           [k] e1000e_check_ltr_demote
     0.25%  [kernel]             [k] __ticket_spin_lock
     0.24%  traffic_server      [.] _ZN7EThread13process_eventEP5Eventi
     0.21%  [kernel]             [k] timerqueue_add
     0.17%  libc-2.12.so       [.] epoll_wait
     0.17%  libpcre.so.0.0.   [.] 0x00000000000100dd
     0.14%  [kernel]             [k] __schedule

1) This behavior is easily reproduced caching* large objects with interim
cache active*.
2) With interim cache *disabled*, this behavior* is not reproduced.*

 As you can see, at the perf top output, the write_to_net_*io *function is
responsible for this heavy CPU usage. We would like to hear of you guys, if
anyone has faced a issue like that, or if you have any clues about this
possible bug.

Thanks & Regards,

-- 

Daniel Biazus
infrastructure Engineering
Azion Technologies
Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
Miami, USA +1 305 704 8816

Quaisquer informações contidas neste e-mail e anexos podem ser
confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
de utilização deste documento depende de autorização do emissor, sujeito as
penalidades cabíveis.

Any information in this e-mail and attachments may be confidential and
privileged, protected by legal confidentiality. The use of this document
require authorization by the issuer, subject to penalties.

Re: Interim cache - High CPU usage

Posted by Thomas Jackson <ja...@gmail.com>.
Personally I'd just avoid the interim cache all together. If you want
tiered storage it is better to do it at the block level. In newer linux
kernels there are a few (I'd use http://en.wikipedia.org/wiki/Bcache). If
not you can compile kernel modules for other (such as flashcache). These
use block-level caching-- so they'll do a better job of caching hot blocks
instead of full http responses-- especially useful if clients abort
downloads soon after starting. In addition this will let your SSD maintain
parts of your files instead of having all or none of it.

On Mon, Jan 12, 2015 at 9:46 AM, Daniel Biazus <da...@azion.com>
wrote:

> Hi guys,
>
>     I agree with your approach, I also think that objects are being
> migrated too frequently between the storage and interim cache. However, I
> can't see any workaround regarding this issue, since the interim cache
> doesn't have a configuration that limits the object size on SSD storage,
> like we have in RAM cache, "proxy.config.cache.ram_cache_cutoff".
>     I tried to configure the "proxy.config.cache.interim.migrate_threshold"
> variable to a higher value, something like 20, but it is not enough because
> when a large object (1GB) reaches this threshold, it is copied to interim
> cache, overloading the CPU and IO.
>
> perf record output:
>
>
>
> ​
>
> Does anyone have an insight about this situation ?
>
> Let me know if you guys needs some more info.
>
> Thank You & Best Regards,
>
>
> On Mon, Jan 12, 2015 at 1:49 AM, gang li <po...@gmail.com> wrote:
>
>> I don't think it is a good idea to use interim cache if the size of the
>> cache objects are very large. As the ssd disk is only 120GB,  the objects
>> on the HDD will be migrated to the ssd frequently, and the ssd storage is
>> meaningless as it will be overwritten quickly, and this will increase the
>> consumption of cpu and io.
>>
>> Can you give more infos from the perf, such as call graph.
>>
>>
>> On Mon, Jan 12, 2015 at 4:52 AM, Daniel Biazus <da...@azion.com>
>> wrote:
>>
>>> Hi Guys,
>>>
>>> We' ve been using ATS as a reverse proxy, and a few week ago We started
>>> to use the interim cache feature in more intense way, caching objects with
>>> the average size of 200 MB and max size of 1 GB.
>>>
>>> *We have ~ 1 TB HDD as a default storage:*
>>>
>>> cat /etc/trafficserver/storage.config
>>>
>>> # ATS - Storage
>>> /dev/sda6 volume=1
>>>
>>> *And also, a 120 GB SDD storage as interim cache:*
>>>
>>> LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1
>>>
>>> After 20 ~ 30 minutes in production with this configuration, We could
>>> notice a sudden CPU high usage, increasing up to 65 %, considering that our
>>> regular usage is 10 %. However the throughput still stable in 250 Mbps per
>>> box.
>>>
>>> We've found the following behavior using the perf top tool:
>>>
>>>    88.18%  traffic_server      [.]
>>> _Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread
>>>      0.32%  traffic_server      [.]
>>> _ZN10NetHandler12mainNetEventEiP5Event
>>>      0.30%  [kernel]             [k] update_sd_lb_stats
>>>      0.29%  [e1000e]           [k] e1000e_check_ltr_demote
>>>      0.25%  [kernel]             [k] __ticket_spin_lock
>>>      0.24%  traffic_server      [.] _ZN7EThread13process_eventEP5Eventi
>>>      0.21%  [kernel]             [k] timerqueue_add
>>>      0.17%  libc-2.12.so       [.] epoll_wait
>>>      0.17%  libpcre.so.0.0.   [.] 0x00000000000100dd
>>>      0.14%  [kernel]             [k] __schedule
>>>
>>> 1) This behavior is easily reproduced caching* large objects with
>>> interim cache active*.
>>> 2) With interim cache *disabled*, this behavior* is not reproduced.*
>>>
>>>  As you can see, at the perf top output, the write_to_net_*io *function
>>> is responsible for this heavy CPU usage. We would like to hear of you guys,
>>> if anyone has faced a issue like that, or if you have any clues about this
>>> possible bug.
>>>
>>> Thanks & Regards,
>>>
>>> --
>>>
>>> Daniel Biazus
>>> infrastructure Engineering
>>> Azion Technologies
>>> Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
>>> Miami, USA +1 305 704 8816
>>>
>>> Quaisquer informações contidas neste e-mail e anexos podem ser
>>> confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
>>> de utilização deste documento depende de autorização do emissor, sujeito as
>>> penalidades cabíveis.
>>>
>>> Any information in this e-mail and attachments may be confidential and
>>> privileged, protected by legal confidentiality. The use of this document
>>> require authorization by the issuer, subject to penalties.
>>>
>>>
>>
>
>
> --
>
> Daniel Biazus
> infrastructure Engineering
> Azion Technologies
> Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
> Miami, USA +1 305 704 8816
>
> Quaisquer informações contidas neste e-mail e anexos podem ser
> confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
> de utilização deste documento depende de autorização do emissor, sujeito as
> penalidades cabíveis.
>
> Any information in this e-mail and attachments may be confidential and
> privileged, protected by legal confidentiality. The use of this document
> require authorization by the issuer, subject to penalties.
>
>

Re: Interim cache - High CPU usage

Posted by Daniel Biazus <da...@azion.com>.
Hi guys,

    I agree with your approach, I also think that objects are being
migrated too frequently between the storage and interim cache. However, I
can't see any workaround regarding this issue, since the interim cache
doesn't have a configuration that limits the object size on SSD storage,
like we have in RAM cache, "proxy.config.cache.ram_cache_cutoff".
    I tried to configure the "proxy.config.cache.interim.migrate_threshold"
variable to a higher value, something like 20, but it is not enough because
when a large object (1GB) reaches this threshold, it is copied to interim
cache, overloading the CPU and IO.

perf record output:



​

Does anyone have an insight about this situation ?

Let me know if you guys needs some more info.

Thank You & Best Regards,


On Mon, Jan 12, 2015 at 1:49 AM, gang li <po...@gmail.com> wrote:

> I don't think it is a good idea to use interim cache if the size of the
> cache objects are very large. As the ssd disk is only 120GB,  the objects
> on the HDD will be migrated to the ssd frequently, and the ssd storage is
> meaningless as it will be overwritten quickly, and this will increase the
> consumption of cpu and io.
>
> Can you give more infos from the perf, such as call graph.
>
>
> On Mon, Jan 12, 2015 at 4:52 AM, Daniel Biazus <da...@azion.com>
> wrote:
>
>> Hi Guys,
>>
>> We' ve been using ATS as a reverse proxy, and a few week ago We started
>> to use the interim cache feature in more intense way, caching objects with
>> the average size of 200 MB and max size of 1 GB.
>>
>> *We have ~ 1 TB HDD as a default storage:*
>>
>> cat /etc/trafficserver/storage.config
>>
>> # ATS - Storage
>> /dev/sda6 volume=1
>>
>> *And also, a 120 GB SDD storage as interim cache:*
>>
>> LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1
>>
>> After 20 ~ 30 minutes in production with this configuration, We could
>> notice a sudden CPU high usage, increasing up to 65 %, considering that our
>> regular usage is 10 %. However the throughput still stable in 250 Mbps per
>> box.
>>
>> We've found the following behavior using the perf top tool:
>>
>>    88.18%  traffic_server      [.]
>> _Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread
>>      0.32%  traffic_server      [.] _ZN10NetHandler12mainNetEventEiP5Event
>>      0.30%  [kernel]             [k] update_sd_lb_stats
>>      0.29%  [e1000e]           [k] e1000e_check_ltr_demote
>>      0.25%  [kernel]             [k] __ticket_spin_lock
>>      0.24%  traffic_server      [.] _ZN7EThread13process_eventEP5Eventi
>>      0.21%  [kernel]             [k] timerqueue_add
>>      0.17%  libc-2.12.so       [.] epoll_wait
>>      0.17%  libpcre.so.0.0.   [.] 0x00000000000100dd
>>      0.14%  [kernel]             [k] __schedule
>>
>> 1) This behavior is easily reproduced caching* large objects with
>> interim cache active*.
>> 2) With interim cache *disabled*, this behavior* is not reproduced.*
>>
>>  As you can see, at the perf top output, the write_to_net_*io *function
>> is responsible for this heavy CPU usage. We would like to hear of you guys,
>> if anyone has faced a issue like that, or if you have any clues about this
>> possible bug.
>>
>> Thanks & Regards,
>>
>> --
>>
>> Daniel Biazus
>> infrastructure Engineering
>> Azion Technologies
>> Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
>> Miami, USA +1 305 704 8816
>>
>> Quaisquer informações contidas neste e-mail e anexos podem ser
>> confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
>> de utilização deste documento depende de autorização do emissor, sujeito as
>> penalidades cabíveis.
>>
>> Any information in this e-mail and attachments may be confidential and
>> privileged, protected by legal confidentiality. The use of this document
>> require authorization by the issuer, subject to penalties.
>>
>>
>


-- 

Daniel Biazus
infrastructure Engineering
Azion Technologies
Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
Miami, USA +1 305 704 8816

Quaisquer informações contidas neste e-mail e anexos podem ser
confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
de utilização deste documento depende de autorização do emissor, sujeito as
penalidades cabíveis.

Any information in this e-mail and attachments may be confidential and
privileged, protected by legal confidentiality. The use of this document
require authorization by the issuer, subject to penalties.

Re: Interim cache - High CPU usage

Posted by gang li <po...@gmail.com>.
I don't think it is a good idea to use interim cache if the size of the
cache objects are very large. As the ssd disk is only 120GB,  the objects
on the HDD will be migrated to the ssd frequently, and the ssd storage is
meaningless as it will be overwritten quickly, and this will increase the
consumption of cpu and io.

Can you give more infos from the perf, such as call graph.


On Mon, Jan 12, 2015 at 4:52 AM, Daniel Biazus <da...@azion.com>
wrote:

> Hi Guys,
>
> We' ve been using ATS as a reverse proxy, and a few week ago We started to
> use the interim cache feature in more intense way, caching objects with the
> average size of 200 MB and max size of 1 GB.
>
> *We have ~ 1 TB HDD as a default storage:*
>
> cat /etc/trafficserver/storage.config
>
> # ATS - Storage
> /dev/sda6 volume=1
>
> *And also, a 120 GB SDD storage as interim cache:*
>
> LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1
>
> After 20 ~ 30 minutes in production with this configuration, We could
> notice a sudden CPU high usage, increasing up to 65 %, considering that our
> regular usage is 10 %. However the throughput still stable in 250 Mbps per
> box.
>
> We've found the following behavior using the perf top tool:
>
>    88.18%  traffic_server      [.]
> _Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread
>      0.32%  traffic_server      [.] _ZN10NetHandler12mainNetEventEiP5Event
>      0.30%  [kernel]             [k] update_sd_lb_stats
>      0.29%  [e1000e]           [k] e1000e_check_ltr_demote
>      0.25%  [kernel]             [k] __ticket_spin_lock
>      0.24%  traffic_server      [.] _ZN7EThread13process_eventEP5Eventi
>      0.21%  [kernel]             [k] timerqueue_add
>      0.17%  libc-2.12.so       [.] epoll_wait
>      0.17%  libpcre.so.0.0.   [.] 0x00000000000100dd
>      0.14%  [kernel]             [k] __schedule
>
> 1) This behavior is easily reproduced caching* large objects with interim
> cache active*.
> 2) With interim cache *disabled*, this behavior* is not reproduced.*
>
>  As you can see, at the perf top output, the write_to_net_*io *function
> is responsible for this heavy CPU usage. We would like to hear of you guys,
> if anyone has faced a issue like that, or if you have any clues about this
> possible bug.
>
> Thanks & Regards,
>
> --
>
> Daniel Biazus
> infrastructure Engineering
> Azion Technologies
> Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032
> Miami, USA +1 305 704 8816
>
> Quaisquer informações contidas neste e-mail e anexos podem ser
> confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer forma
> de utilização deste documento depende de autorização do emissor, sujeito as
> penalidades cabíveis.
>
> Any information in this e-mail and attachments may be confidential and
> privileged, protected by legal confidentiality. The use of this document
> require authorization by the issuer, subject to penalties.
>
>