You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafficserver.apache.org by Bryan Call <bc...@apache.org> on 2012/07/31 05:10:24 UTC

lock contention in cache?

When I am doing benchmarking on ATS I am seeing some requests >= 40ms but the vast majority are taking <= 1ms.  When turning on slow log it looks like the issue is around cache.  is there any option that I can set to mitigate this issue or anything I can do to dig in deeper to find out where the problems is?

[Jul 31 03:01:57.160] Server {0x2ad695824700} ERROR: [36364408] Slow Request: url: http://q4.ycpi.bf1.yahoo.net:8124/200_10_0_cache status: 200 unique id:  bytes: 200 fd: 0 client state: 0 server state: 0 ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: 0.040 dns_lookup_begin: -1.000 dns_lookup_end: -1.000 server_connect: -1.000 server_first_read: -1.000 server_read_header_done: -1.000 server_close: -1.000 ua_close: 0.040 sm_finish: 0.040
[Jul 31 03:01:58.440] Server {0x2ad695824700} ERROR: [36556479] Slow Request: url: http://q4.ycpi.bf1.yahoo.net:8124/200_10_0_cache status: 200 unique id:  bytes: 200 fd: 0 client state: 0 server state: 0 ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: 0.040 dns_lookup_begin: -1.000 dns_lookup_end: -1.000 server_connect: -1.000 server_first_read: -1.000 server_read_header_done: -1.000 server_close: -1.000 ua_close: 0.040 sm_finish: 0.040

Here are the results of the benchmark.  This occurs in less then 1% of the responses:
[client bcall@q1 bench]$ fbench -s 120 -c 2 -n 200 -q "200_%d.query" q2.ycpi.bf1.yahoo.net 8080
Starting clients...
[dummydate]: PROGRESS: fbench: Seconds left 120
[dummydate]: PROGRESS: fbench: Seconds left 60
Stopping clients........................................................................................................................................................................................................
Clients stopped.
*** HTTP keep-alive statistics ***
connection reuse count -- 17774265
***************** Benchmark Summary *****************
clients:                     200
ran for:                     120 seconds
cycle time:                    2 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:    17774465
cycles not held:           20219
minimum response time:      0.09 ms
maximum response time:     84.21 ms
average response time:      0.29 ms
25 percentile:              0.00 ms
50 percentile:              0.00 ms
75 percentile:              0.00 ms
90 percentile:              0.00 ms
95 percentile:              0.00 ms
99 percentile:              1.00 ms
actual query rate:      148175.09 Q/s
utilization:               21.41 %
zero hit queries:              0

-Bryan

Re: lock contention in cache?

Posted by Bryan Call <bc...@apache.org>.

In the test I did on another box, harder to replace the binaries on the box I was benchmarking, I am see the same Vol* come back on the different URLs I tested.  It looks like it is using only 1 mutex for locking.  I have 1GB of ram cache and 10GB of disk cache in this test, if that matters.
[Jul 30 21:41:00.463] Server {0x7f333e441700} DEBUG: (bcall) key_to_vol hash_table: 0x7f3318001e00 hash: 20490 vol: 0x7f3328004610
[Jul 30 21:41:01.793] Server {0x7f333e23f700} DEBUG: (bcall) key_to_vol hash_table: 0x7f3318001e00 hash: 9142 vol: 0x7f3328004610
[Jul 30 21:54:07.196] Server {0x7f333de3b700} DEBUG: (bcall) key_to_vol hash_table: 0x7f3318001e00 hash: 304 vol: 0x7f3328004610
[Jul 30 21:54:10.775] Server {0x7f333dd3a700} DEBUG: (bcall) key_to_vol hash_table: 0x7f3318001e00 hash: 28827 vol: 0x7f3328004610
[Jul 30 21:54:18.033] Server {0x7f333f8c3840} DEBUG: (bcall) key_to_vol hash_table: 0x7f3318001e00 hash: 20693 vol: 0x7f3328004610

changed code in key_to_vol:
    Vol *tmp = host_rec->vols[hash_table[h]];
    Debug("bcall", "key_to_vol hash_table: %p hash: %d vol: %p", hash_table, h, tmp);
    return tmp;

-Bryan



On Jul 30, 2012, at 8:10 PM, Bryan Call <bc...@apache.org> wrote:

> When I am doing benchmarking on ATS I am seeing some requests >= 40ms but the vast majority are taking <= 1ms.  When turning on slow log it looks like the issue is around cache.  is there any option that I can set to mitigate this issue or anything I can do to dig in deeper to find out where the problems is?
> 
> [Jul 31 03:01:57.160] Server {0x2ad695824700} ERROR: [36364408] Slow Request: url: http://q4.ycpi.bf1.yahoo.net:8124/200_10_0_cache status: 200 unique id:  bytes: 200 fd: 0 client state: 0 server state: 0 ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: 0.040 dns_lookup_begin: -1.000 dns_lookup_end: -1.000 server_connect: -1.000 server_first_read: -1.000 server_read_header_done: -1.000 server_close: -1.000 ua_close: 0.040 sm_finish: 0.040
> [Jul 31 03:01:58.440] Server {0x2ad695824700} ERROR: [36556479] Slow Request: url: http://q4.ycpi.bf1.yahoo.net:8124/200_10_0_cache status: 200 unique id:  bytes: 200 fd: 0 client state: 0 server state: 0 ua_begin: 0.000 ua_read_header_done: 0.000 cache_open_read_begin: 0.000 cache_open_read_end: 0.040 dns_lookup_begin: -1.000 dns_lookup_end: -1.000 server_connect: -1.000 server_first_read: -1.000 server_read_header_done: -1.000 server_close: -1.000 ua_close: 0.040 sm_finish: 0.040
> 
> Here are the results of the benchmark.  This occurs in less then 1% of the responses:
> [client bcall@q1 bench]$ fbench -s 120 -c 2 -n 200 -q "200_%d.query" q2.ycpi.bf1.yahoo.net 8080
> Starting clients...
> [dummydate]: PROGRESS: fbench: Seconds left 120
> [dummydate]: PROGRESS: fbench: Seconds left 60
> Stopping clients........................................................................................................................................................................................................
> Clients stopped.
> *** HTTP keep-alive statistics ***
> connection reuse count -- 17774265
> ***************** Benchmark Summary *****************
> clients:                     200
> ran for:                     120 seconds
> cycle time:                    2 ms
> lower response limit:          0 bytes
> skipped requests:              0
> failed requests:               0
> successful requests:    17774465
> cycles not held:           20219
> minimum response time:      0.09 ms
> maximum response time:     84.21 ms
> average response time:      0.29 ms
> 25 percentile:              0.00 ms
> 50 percentile:              0.00 ms
> 75 percentile:              0.00 ms
> 90 percentile:              0.00 ms
> 95 percentile:              0.00 ms
> 99 percentile:              1.00 ms
> actual query rate:      148175.09 Q/s
> utilization:               21.41 %
> zero hit queries:              0
> 
> -Bryan
> 
> 
>