You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "Bryan Call (JIRA)" <ji...@apache.org> on 2013/05/20 20:05:16 UTC

[jira] [Comment Edited] (TS-1684) Reduce the usage of global allocation/free lists - switch to using local thread allocation/free lists

    [ https://issues.apache.org/jira/browse/TS-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662180#comment-13662180 ] 

Bryan Call edited comment on TS-1684 at 5/20/13 6:05 PM:
---------------------------------------------------------

ats results for 3.2.0_11 - without the proxy allocator patch included in this bug
{code}
-bash-4.2$ cat out | http_load/merge_stats.pl
Total runs: 13
4112097 fetches on 4764 conns, 1300 max parallell, 5.75693e+09 bytes in 30 seconds
1400 mean bytes/fetch
137069.90 fetches/sec, 1.91898e+08 bytes/sec
msecs/connect: 0.180437538461538 mean, 0.662384615384615 max, 0.0604615384615384 min
msecs/first-response: 9.47338384615385 mean, 162.386230769231 max, 0.152307692307692 min



-bash-4.1$ dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 72  10   7   0   0  11|   0    26k|  31M  224M|   0     0 | 291k   20k
 74  10   6   0   0  10|   0    24k|  31M  228M|   0     0 | 294k   20k
 72  10   6   1   0  11|   0    41M|  31M  227M|   0     0 | 294k   23k
 71  10   6   1   0  12| 410B   49M|  31M  223M|   0     0 | 295k   21k


Samples: 2M of event 'cycles', Event count (approx.): 628623285644
 23.47%  libtsutil.so.3                [.] ink_freelist_new
 20.22%  libtsutil.so.3                [.] ink_freelist_free
  2.85%  [kernel]                      [k] _spin_lock_bh
  1.51%  traffic_server                [.] HttpSM::_instantiate_func(HttpSM*, Ht
  1.35%  [kernel]                      [k] _spin_lock
  1.03%  traffic_server                [.] HdrHeap::destroy()
  0.72%  libc-2.12.so                  [.] memcpy
  0.72%  traffic_server                [.] HdrHeap::allocate_obj(int, int)
  0.63%  [bnx2x]                       [k] bnx2x_start_xmit
  0.60%  libpthread-2.12.so            [.] pthread_mutex_trylock
  0.56%  [kernel]                      [k] copy_user_generic_string
  0.50%  libpthread-2.12.so            [.] pthread_mutex_unlock
  0.49%  traffic_server                [.] HdrHeap::duplicate_str(char const*, i
  0.49%  traffic_server                [.] write_to_net_io(NetHandler*, UnixNetV
  0.48%  [kernel]                      [k] tcp_ack
  0.47%  traffic_server                [.] PriorityEventQueue::check_ready(long,
  0.46%  traffic_server                [.] HttpSM::cleanup()
  0.46%  traffic_server                [.] IOBufferBlock::free()
  0.44%  traffic_server                [.] mime_hdr_field_find(MIMEHdrImpl*, cha
  0.43%  [kernel]                      [k] skb_release_data
  0.42%  traffic_server                [.] RamCacheCLFUS::get(INK_MD5*, Ptr<IOBu
{code}

ats results for 3.2.0_12 - with the proxyallocator patch included in this bug
{code}
-bash-4.2$ cat out | http_load/merge_stats.pl Total runs: 13
15551539 fetches on 16174 conns, 1300 max parallell, 2.17722e+10 bytes in 60 seconds
1400 mean bytes/fetch
259192.20 fetches/sec, 3.62869e+08 bytes/sec
msecs/connect: 14.0116384615385 mean, 2503.94692307692 max, 0.0573846153846154 min
msecs/first-response: 4.95988153846154 mean, 302.202153846154 max, 0.100384615384615 min




-bash-4.1$ dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 62  16   8   0   0  14|   0    36k|  60M  438M|   0     0 | 335k   60k
 60  16   9   0   0  15|   0    34k|  60M  432M|   0     0 | 337k   60k
 61  16   8   0   0  15|   0    20M|  59M  430M|   0     0 | 336k   59k

Samples: 1M of event 'cycles', Event count (approx.): 471148870436
  8.83%  libtsutil.so.3                [.] ink_freelist_free
  4.49%  libtsutil.so.3                [.] ink_freelist_new
  2.24%  traffic_server                [.] HttpSM::_instantiate_func(HttpSM*, Ht
  2.02%  [kernel]                      [k] _spin_lock_bh
  2.01%  [kernel]                      [k] _spin_lock
  1.27%  libc-2.12.so                  [.] memcpy
  1.09%  [bnx2x]                       [k] bnx2x_start_xmit
  1.06%  traffic_server                [.] PriorityEventQueue::check_ready(long,
  1.03%  traffic_server                [.] this_ethread()
  0.99%  [kernel]                      [k] copy_user_generic_string
  0.84%  [kernel]                      [k] tcp_ack
  0.78%  traffic_server                [.] HttpSM::cleanup()
  0.78%  traffic_server                [.] mime_hdr_field_find(MIMEHdrImpl*, cha
  0.71%  traffic_server                [.] HdrHeap::allocate_obj(int, int)
  0.67%  libpthread-2.12.so            [.] pthread_getspecific
  0.66%  traffic_server                [.] RamCacheCLFUS::get(INK_MD5*, Ptr<IOBu
  0.63%  traffic_server                [.] HdrHeap::destroy()
  0.63%  libpthread-2.12.so            [.] pthread_mutex_trylock
  0.61%  [kernel]                      [k] kfree
  0.60%  [bnx2x]                       [k] bnx2x_rx_int
  0.58%  traffic_server                [.] HdrHeap::duplicate_str(char const*, i
  0.52%  traffic_server                [.] read_from_net(NetHandler*, UnixNetVCo
  0.52%  traffic_server                [.] thread_alloc(Allocator&, ProxyAllocat
  0.51%  traffic_server                [.] write_to_net_io(NetHandler*, UnixNetV
  0.51%  libpthread-2.12.so            [.] pthread_mutex_unlock
  0.51%  [kernel]                      [k] tcp_recvmsg
{code}
                
      was (Author: bcall):
    
ats results for 3.2.0_11 - without the proxy allocator patch included in this bug
-bash-4.2$ cat out | http_load/merge_stats.pl
Total runs: 13
4112097 fetches on 4764 conns, 1300 max parallell, 5.75693e+09 bytes in 30 seconds
1400 mean bytes/fetch
137069.90 fetches/sec, 1.91898e+08 bytes/sec
msecs/connect: 0.180437538461538 mean, 0.662384615384615 max, 0.0604615384615384 min
msecs/first-response: 9.47338384615385 mean, 162.386230769231 max, 0.152307692307692 min



-bash-4.1$ dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 72  10   7   0   0  11|   0    26k|  31M  224M|   0     0 | 291k   20k
 74  10   6   0   0  10|   0    24k|  31M  228M|   0     0 | 294k   20k
 72  10   6   1   0  11|   0    41M|  31M  227M|   0     0 | 294k   23k
 71  10   6   1   0  12| 410B   49M|  31M  223M|   0     0 | 295k   21k


Samples: 2M of event 'cycles', Event count (approx.): 628623285644
 23.47%  libtsutil.so.3                [.] ink_freelist_new
 20.22%  libtsutil.so.3                [.] ink_freelist_free
  2.85%  [kernel]                      [k] _spin_lock_bh
  1.51%  traffic_server                [.] HttpSM::_instantiate_func(HttpSM*, Ht
  1.35%  [kernel]                      [k] _spin_lock
  1.03%  traffic_server                [.] HdrHeap::destroy()
  0.72%  libc-2.12.so                  [.] memcpy
  0.72%  traffic_server                [.] HdrHeap::allocate_obj(int, int)
  0.63%  [bnx2x]                       [k] bnx2x_start_xmit
  0.60%  libpthread-2.12.so            [.] pthread_mutex_trylock
  0.56%  [kernel]                      [k] copy_user_generic_string
  0.50%  libpthread-2.12.so            [.] pthread_mutex_unlock
  0.49%  traffic_server                [.] HdrHeap::duplicate_str(char const*, i
  0.49%  traffic_server                [.] write_to_net_io(NetHandler*, UnixNetV
  0.48%  [kernel]                      [k] tcp_ack
  0.47%  traffic_server                [.] PriorityEventQueue::check_ready(long,
  0.46%  traffic_server                [.] HttpSM::cleanup()
  0.46%  traffic_server                [.] IOBufferBlock::free()
  0.44%  traffic_server                [.] mime_hdr_field_find(MIMEHdrImpl*, cha
  0.43%  [kernel]                      [k] skb_release_data
  0.42%  traffic_server                [.] RamCacheCLFUS::get(INK_MD5*, Ptr<IOBu



ats results for 3.2.0_12 - with the proxyallocator patch included in this bug
-bash-4.2$ cat out | http_load/merge_stats.pl Total runs: 13
15551539 fetches on 16174 conns, 1300 max parallell, 2.17722e+10 bytes in 60 seconds
1400 mean bytes/fetch
259192.20 fetches/sec, 3.62869e+08 bytes/sec
msecs/connect: 14.0116384615385 mean, 2503.94692307692 max, 0.0573846153846154 min
msecs/first-response: 4.95988153846154 mean, 302.202153846154 max, 0.100384615384615 min




-bash-4.1$ dstat 10
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 62  16   8   0   0  14|   0    36k|  60M  438M|   0     0 | 335k   60k
 60  16   9   0   0  15|   0    34k|  60M  432M|   0     0 | 337k   60k
 61  16   8   0   0  15|   0    20M|  59M  430M|   0     0 | 336k   59k

Samples: 1M of event 'cycles', Event count (approx.): 471148870436
  8.83%  libtsutil.so.3                [.] ink_freelist_free
  4.49%  libtsutil.so.3                [.] ink_freelist_new
  2.24%  traffic_server                [.] HttpSM::_instantiate_func(HttpSM*, Ht
  2.02%  [kernel]                      [k] _spin_lock_bh
  2.01%  [kernel]                      [k] _spin_lock
  1.27%  libc-2.12.so                  [.] memcpy
  1.09%  [bnx2x]                       [k] bnx2x_start_xmit
  1.06%  traffic_server                [.] PriorityEventQueue::check_ready(long,
  1.03%  traffic_server                [.] this_ethread()
  0.99%  [kernel]                      [k] copy_user_generic_string
  0.84%  [kernel]                      [k] tcp_ack
  0.78%  traffic_server                [.] HttpSM::cleanup()
  0.78%  traffic_server                [.] mime_hdr_field_find(MIMEHdrImpl*, cha
  0.71%  traffic_server                [.] HdrHeap::allocate_obj(int, int)
  0.67%  libpthread-2.12.so            [.] pthread_getspecific
  0.66%  traffic_server                [.] RamCacheCLFUS::get(INK_MD5*, Ptr<IOBu
  0.63%  traffic_server                [.] HdrHeap::destroy()
  0.63%  libpthread-2.12.so            [.] pthread_mutex_trylock
  0.61%  [kernel]                      [k] kfree
  0.60%  [bnx2x]                       [k] bnx2x_rx_int
  0.58%  traffic_server                [.] HdrHeap::duplicate_str(char const*, i
  0.52%  traffic_server                [.] read_from_net(NetHandler*, UnixNetVCo
  0.52%  traffic_server                [.] thread_alloc(Allocator&, ProxyAllocat
  0.51%  traffic_server                [.] write_to_net_io(NetHandler*, UnixNetV
  0.51%  libpthread-2.12.so            [.] pthread_mutex_unlock
  0.51%  [kernel]                      [k] tcp_recvmsg
                  
> Reduce the usage of global allocation/free lists - switch to using local thread allocation/free lists
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TS-1684
>                 URL: https://issues.apache.org/jira/browse/TS-1684
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Bryan Call
>            Assignee: Bryan Call
>             Fix For: 3.3.4
>
>         Attachments: ts-1684.patch
>
>
> When running benchmarks ink_freelist_new() normally shows up as one of if not the number one function in the code using the most CPU.  Currently ATS uses global free lists (via ClassAllocator<>, Allocator, and SparceClassAllocator<>) for memory allocation for some of its memory allocation.
> Here is a list of how frequently the type of allocations are used and the "name" given to the allocator.  This is a benchmark for a small object in cache fetched 100k times.
>  400000 ink_freelist_new: hdrHeap
>  300000 ink_freelist_new: hdrStrHeap
>  203541 ink_freelist_new: ioBlockAllocator
>  199616 proxy allocator thread_alloc: eventAllocator
>  103554 ink_freelist_new: ioDataAllocator
>  103554 ink_freelist_new: ioBufAllocator[5]
>  100100 ink_freelist_new: ioAllocator
>  100000 proxy allocator thread_alloc: hdrHeap
>  100000 proxy allocator thread_alloc: cacheVConnection
>  100000 ink_freelist_new: httpSMAllocator
>  100000 ink_freelist_new: ArenaBlock
>   18507 ink_freelist_new: mutexAllocator
>    4772 ink_freelist_new: eventAllocator
>     162 ink_freelist_new: cacheVConnection
>     102 ink_freelist_new: netVCAllocator
>     100 proxy allocator init thread_alloc: httpClientSessionAllocator
>     100 ink_freelist_new: httpClientSessionAllocator
>       1 proxy allocator thread_alloc: RamCacheCLFUSEntry
>       1 ink_freelist_new: RamCacheCLFUSEntry
>       1 ink_freelist_new: hostDBContAllocator

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira