You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by edd! <me...@eddi.me> on 2020/04/26 23:32:34 UTC

cache read and write failures

Hi,

New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
# source /opt/rh/devtoolset-7/enable
# ./configure --enable-experimental-plugins
# make && make install
Testing it as a transparent forward proxy serving ~500 users with http
caching enabled. I have first tried raw cache storage then volume file but
in both cases I got so many read and few write failures on a 100 GB SSD
partition.

"proxy.process.cache.volume_0.bytes_used": "3323351040",
"proxy.process.cache.volume_0.bytes_total": "106167836672",
"proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
"proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
"proxy.process.cache.volume_0.ram_cache.hits": "4916",
"proxy.process.cache.volume_0.ram_cache.misses": "1411",
"proxy.process.cache.volume_0.pread_count": "0",
"proxy.process.cache.volume_0.percent_full": "3",
"proxy.process.cache.volume_0.lookup.active": "0",
"proxy.process.cache.volume_0.lookup.success": "0",
"proxy.process.cache.volume_0.lookup.failure": "0",
"proxy.process.cache.volume_0.read.active": "1",
"proxy.process.cache.volume_0.read.success": "5566",
"proxy.process.cache.volume_0.read.failure": "22084",
"proxy.process.cache.volume_0.write.active": "8",
"proxy.process.cache.volume_0.write.success": "5918",
"proxy.process.cache.volume_0.write.failure": "568",
"proxy.process.cache.volume_0.write.backlog.failure": "272",
"proxy.process.cache.volume_0.update.active": "1",
"proxy.process.cache.volume_0.update.success": "306",
"proxy.process.cache.volume_0.update.failure": "4",
"proxy.process.cache.volume_0.remove.active": "0",
"proxy.process.cache.volume_0.remove.success": "0",
"proxy.process.cache.volume_0.remove.failure": "0",
"proxy.process.cache.volume_0.evacuate.active": "0",
"proxy.process.cache.volume_0.evacuate.success": "0",
"proxy.process.cache.volume_0.evacuate.failure": "0",
"proxy.process.cache.volume_0.scan.active": "0",
"proxy.process.cache.volume_0.scan.success": "0",
"proxy.process.cache.volume_0.scan.failure": "0",
"proxy.process.cache.volume_0.direntries.total": "13255088",
"proxy.process.cache.volume_0.direntries.used": "9230",
"proxy.process.cache.volume_0.directory_collision": "0",
"proxy.process.cache.volume_0.frags_per_doc.1": "5372",
"proxy.process.cache.volume_0.frags_per_doc.2": "0",
"proxy.process.cache.volume_0.frags_per_doc.3+": "625",
"proxy.process.cache.volume_0.read_busy.success": "4",
"proxy.process.cache.volume_0.read_busy.failure": "351",

Disk read/write tests:
hdparm -t /dev/sdb1
/dev/sdb1:
 Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec

hdparm -T /dev/sdb1
/dev/sdb1:
 Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec

dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s

dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s

Please help,

Thank you,
Eddi

Re: cache read and write failures

Posted by edd! <me...@eddi.me>.
I agree

Thank you guys

On Thu, 14 May 2020, 03:29 Leif Hedstrom, <zw...@apache.org> wrote:

> Also, all this was discussed on the issue that was opened :). Which I
> think we should close, these metrics works exactly as expected albeit
> somewhat confusing.
>
> — Leif
>
> On May 13, 2020, at 17:51, Alan Carroll <so...@verizonmedia.com>
> wrote:
>
> 
> Cache misses count as read failures, so you should expect a lot of that on
> an empty cache. Write failures can be caused by clients giving up on the
> transaction. So I'd take those numbers with a bit of caution. If actual
> system level read/write failures happen more than 4 or 5 times in a row,
> the disk is taken off line.
>
> On Tue, Apr 28, 2020 at 11:35 AM edd! <me...@eddi.me> wrote:
>
>> Interesting figures, especially that the failures seem to persist even on
>> different major releases.
>> I initially thought the failures existed because I am testing ATS in a
>> VM. So, yesterday I installed the same OS and ATS release, compiled the
>> source as provided, on a bare HP server with 2x240 GB SSD in RAID1 for the
>> OS and 2 x 1TB each in its own volume (RAID0) for the cache storage...
>> However, I still have failures (see below)
>>
>> "proxy.process.version.server.short": "8.0.7",
>> "proxy.process.version.server.long": "Apache Traffic Server - traffic_server - 8.0.7 - (build # 042720 on Apr 27 2020 at 20:11:28)",
>> "proxy.process.version.server.build_number": "042720",
>> "proxy.process.version.server.build_time": "20:11:28",
>> "proxy.process.version.server.build_date": "Apr 27 2020",
>> "proxy.process.version.server.build_person": "root",
>> "proxy.process.http.background_fill_current_count": "4",
>> "proxy.process.http.current_client_connections": "640",
>> "proxy.process.http.current_active_client_connections": "122",
>> "proxy.process.http.websocket.current_active_client_connections": "0",
>> "proxy.process.http.current_client_transactions": "261",
>> "proxy.process.http.current_server_transactions": "184",
>> "proxy.process.http.current_parent_proxy_connections": "0",
>> "proxy.process.http.current_server_connections": "805",
>> "proxy.process.http.current_cache_connections": "155",
>>
>> "proxy.process.cache.volume_0.bytes_used": "65715343360",
>> "proxy.process.cache.volume_0.bytes_total": "1917932191744",
>> "proxy.process.cache.volume_0.ram_cache.total_bytes": "25769803776",
>> "proxy.process.cache.volume_0.ram_cache.bytes_used": "770314880",
>> "proxy.process.cache.volume_0.ram_cache.hits": "541749",
>> "proxy.process.cache.volume_0.ram_cache.misses": "115283",
>> "proxy.process.cache.volume_0.pread_count": "0",
>> "proxy.process.cache.volume_0.percent_full": "3",
>> "proxy.process.cache.volume_0.lookup.active": "0",
>> "proxy.process.cache.volume_0.lookup.success": "0",
>> "proxy.process.cache.volume_0.lookup.failure": "0",
>> "proxy.process.cache.volume_0.read.active": "4",
>> "proxy.process.cache.volume_0.read.success": "525727","proxy.process.cache.volume_0.read.failure": "1290873",
>> "proxy.process.cache.volume_0.write.active": "153",
>> "proxy.process.cache.volume_0.write.success": "222118","proxy.process.cache.volume_0.write.failure": "42813",
>> "proxy.process.cache.volume_0.write.backlog.failure": "0",
>> "proxy.process.cache.volume_0.update.active": "1",
>> "proxy.process.cache.volume_0.update.success": "33634","proxy.process.cache.volume_0.update.failure": "533",
>> "proxy.process.cache.volume_0.remove.active": "0",
>> "proxy.process.cache.volume_0.remove.success": "0",
>> "proxy.process.cache.volume_0.remove.failure": "0",
>> "proxy.process.cache.volume_0.evacuate.active": "0",
>> "proxy.process.cache.volume_0.evacuate.success": "0",
>> "proxy.process.cache.volume_0.evacuate.failure": "0",
>> "proxy.process.cache.volume_0.scan.active": "0",
>> "proxy.process.cache.volume_0.scan.success": "0",
>> "proxy.process.cache.volume_0.scan.failure": "0",
>> "proxy.process.cache.volume_0.direntries.total": "239453928",
>> "proxy.process.cache.volume_0.direntries.used": "274305",
>> "proxy.process.cache.volume_0.directory_collision": "0",
>> "proxy.process.cache.volume_0.frags_per_doc.1": "228249",
>> "proxy.process.cache.volume_0.frags_per_doc.2": "0",
>> "proxy.process.cache.volume_0.frags_per_doc.3+": "10884",
>> "proxy.process.cache.volume_0.read_busy.success": "105","proxy.process.cache.volume_0.read_busy.failure": "52600",
>> "proxy.process.cache.volume_0.write_bytes_stat": "0",
>> "proxy.process.cache.volume_0.vector_marshals": "0",
>> "proxy.process.cache.volume_0.hdr_marshals": "0",
>> "proxy.process.cache.volume_0.hdr_marshal_bytes": "0",
>> "proxy.process.cache.volume_0.gc_bytes_evacuated": "0",
>> "proxy.process.cache.volume_0.gc_frags_evacuated": "0",
>> "proxy.process.cache.volume_0.wrap_count": "0",
>> "proxy.process.cache.volume_0.sync.count": "252",
>> "proxy.process.cache.volume_0.sync.bytes": "302145806336",
>> "proxy.process.cache.volume_0.sync.time": "72939367496484",
>> "proxy.process.cache.volume_0.span.errors.read": "0",
>> "proxy.process.cache.volume_0.span.errors.write": "0",
>> "proxy.process.cache.volume_0.span.failing": "0",
>> "proxy.process.cache.volume_0.span.offline": "0",
>> "proxy.process.cache.volume_0.span.online": "0",
>> "server": "8.0.7"
>>
>>
>> On Tue, Apr 28, 2020 at 6:58 PM Bryan Call <bc...@apache.org> wrote:
>>
>>> Here are some numbers from our production servers, from a couple
>>> different groups.  We are running a heavily modified version of 7.1.2
>>>
>>> One of our production server for our CDN:
>>> proxy.process.cache.read.success 6037258
>>> proxy.process.cache.read.failure 13845799
>>>
>>> Here is another one from another group:
>>> proxy.process.cache.read.success 5575072
>>> proxy.process.cache.read.failure 26784750
>>>
>>> I talked to another company and they were seeing about a 3% failure rate
>>> on their cache and they are using 8.0.7.  I created an issue on this:
>>> https://github.com/apache/trafficserver/issues/6713
>>>
>>> -Bryan
>>>
>>> > On Apr 26, 2020, at 4:32 PM, edd! <me...@eddi.me> wrote:
>>> >
>>> > Hi,
>>> >
>>> > New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
>>> > # source /opt/rh/devtoolset-7/enable
>>> > # ./configure --enable-experimental-plugins
>>> > # make && make install
>>> > Testing it as a transparent forward proxy serving ~500 users with http
>>> caching enabled. I have first tried raw cache storage then volume file but
>>> in both cases I got so many read and few write failures on a 100 GB SSD
>>> partition.
>>> >
>>> > "proxy.process.cache.volume_0.bytes_used": "3323351040",
>>> > "proxy.process.cache.volume_0.bytes_total": "106167836672",
>>> > "proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
>>> > "proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
>>> > "proxy.process.cache.volume_0.ram_cache.hits": "4916",
>>> > "proxy.process.cache.volume_0.ram_cache.misses": "1411",
>>> > "proxy.process.cache.volume_0.pread_count": "0",
>>> > "proxy.process.cache.volume_0.percent_full": "3",
>>> > "proxy.process.cache.volume_0.lookup.active": "0",
>>> > "proxy.process.cache.volume_0.lookup.success": "0",
>>> > "proxy.process.cache.volume_0.lookup.failure": "0",
>>> > "proxy.process.cache.volume_0.read.active": "1",
>>> > "proxy.process.cache.volume_0.read.success": "5566",
>>> > "proxy.process.cache.volume_0.read.failure": "22084",
>>> > "proxy.process.cache.volume_0.write.active": "8",
>>> > "proxy.process.cache.volume_0.write.success": "5918",
>>> > "proxy.process.cache.volume_0.write.failure": "568",
>>> > "proxy.process.cache.volume_0.write.backlog.failure": "272",
>>> > "proxy.process.cache.volume_0.update.active": "1",
>>> > "proxy.process.cache.volume_0.update.success": "306",
>>> > "proxy.process.cache.volume_0.update.failure": "4",
>>> > "proxy.process.cache.volume_0.remove.active": "0",
>>> > "proxy.process.cache.volume_0.remove.success": "0",
>>> > "proxy.process.cache.volume_0.remove.failure": "0",
>>> > "proxy.process.cache.volume_0.evacuate.active": "0",
>>> > "proxy.process.cache.volume_0.evacuate.success": "0",
>>> > "proxy.process.cache.volume_0.evacuate.failure": "0",
>>> > "proxy.process.cache.volume_0.scan.active": "0",
>>> > "proxy.process.cache.volume_0.scan.success": "0",
>>> > "proxy.process.cache.volume_0.scan.failure": "0",
>>> > "proxy.process.cache.volume_0.direntries.total": "13255088",
>>> > "proxy.process.cache.volume_0.direntries.used": "9230",
>>> > "proxy.process.cache.volume_0.directory_collision": "0",
>>> > "proxy.process.cache.volume_0.frags_per_doc.1": "5372",
>>> > "proxy.process.cache.volume_0.frags_per_doc.2": "0",
>>> > "proxy.process.cache.volume_0.frags_per_doc.3+": "625",
>>> > "proxy.process.cache.volume_0.read_busy.success": "4",
>>> > "proxy.process.cache.volume_0.read_busy.failure": "351",
>>> >
>>> > Disk read/write tests:
>>> > hdparm -t /dev/sdb1
>>> > /dev/sdb1:
>>> >  Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec
>>> >
>>> > hdparm -T /dev/sdb1
>>> > /dev/sdb1:
>>> >  Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec
>>> >
>>> > dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync
>>> > 1+0 records in
>>> > 1+0 records out
>>> > 1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s
>>> >
>>> > dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync
>>> > 1000+0 records in
>>> > 1000+0 records out
>>> > 512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s
>>> >
>>> > Please help,
>>> >
>>> > Thank you,
>>> > Eddi
>>>
>>>

Re: cache read and write failures

Posted by Leif Hedstrom <zw...@apache.org>.
Also, all this was discussed on the issue that was opened :). Which I think we should close, these metrics works exactly as expected albeit somewhat confusing.

— Leif 

> On May 13, 2020, at 17:51, Alan Carroll <so...@verizonmedia.com> wrote:
> 
> 
> Cache misses count as read failures, so you should expect a lot of that on an empty cache. Write failures can be caused by clients giving up on the transaction. So I'd take those numbers with a bit of caution. If actual system level read/write failures happen more than 4 or 5 times in a row, the disk is taken off line. 
> 
>> On Tue, Apr 28, 2020 at 11:35 AM edd! <me...@eddi.me> wrote:
>> Interesting figures, especially that the failures seem to persist even on different major releases.
>> I initially thought the failures existed because I am testing ATS in a VM. So, yesterday I installed the same OS and ATS release, compiled the source as provided, on a bare HP server with 2x240 GB SSD in RAID1 for the OS and 2 x 1TB each in its own volume (RAID0) for the cache storage... However, I still have failures (see below)
>> "proxy.process.version.server.short": "8.0.7",
>> "proxy.process.version.server.long": "Apache Traffic Server - traffic_server - 8.0.7 - (build # 042720 on Apr 27 2020 at 20:11:28)",
>> "proxy.process.version.server.build_number": "042720",
>> "proxy.process.version.server.build_time": "20:11:28",
>> "proxy.process.version.server.build_date": "Apr 27 2020",
>> "proxy.process.version.server.build_person": "root",
>> "proxy.process.http.background_fill_current_count": "4",
>> "proxy.process.http.current_client_connections": "640",
>> "proxy.process.http.current_active_client_connections": "122",
>> "proxy.process.http.websocket.current_active_client_connections": "0",
>> "proxy.process.http.current_client_transactions": "261",
>> "proxy.process.http.current_server_transactions": "184",
>> "proxy.process.http.current_parent_proxy_connections": "0",
>> "proxy.process.http.current_server_connections": "805",
>> "proxy.process.http.current_cache_connections": "155",
>> "proxy.process.cache.volume_0.bytes_used": "65715343360",
>> "proxy.process.cache.volume_0.bytes_total": "1917932191744",
>> "proxy.process.cache.volume_0.ram_cache.total_bytes": "25769803776",
>> "proxy.process.cache.volume_0.ram_cache.bytes_used": "770314880",
>> "proxy.process.cache.volume_0.ram_cache.hits": "541749",
>> "proxy.process.cache.volume_0.ram_cache.misses": "115283",
>> "proxy.process.cache.volume_0.pread_count": "0",
>> "proxy.process.cache.volume_0.percent_full": "3",
>> "proxy.process.cache.volume_0.lookup.active": "0",
>> "proxy.process.cache.volume_0.lookup.success": "0",
>> "proxy.process.cache.volume_0.lookup.failure": "0",
>> "proxy.process.cache.volume_0.read.active": "4",
>> "proxy.process.cache.volume_0.read.success": "525727",
>> "proxy.process.cache.volume_0.read.failure": "1290873",
>> "proxy.process.cache.volume_0.write.active": "153",
>> "proxy.process.cache.volume_0.write.success": "222118",
>> "proxy.process.cache.volume_0.write.failure": "42813",
>> "proxy.process.cache.volume_0.write.backlog.failure": "0",
>> "proxy.process.cache.volume_0.update.active": "1",
>> "proxy.process.cache.volume_0.update.success": "33634",
>> "proxy.process.cache.volume_0.update.failure": "533",
>> "proxy.process.cache.volume_0.remove.active": "0",
>> "proxy.process.cache.volume_0.remove.success": "0",
>> "proxy.process.cache.volume_0.remove.failure": "0",
>> "proxy.process.cache.volume_0.evacuate.active": "0",
>> "proxy.process.cache.volume_0.evacuate.success": "0",
>> "proxy.process.cache.volume_0.evacuate.failure": "0",
>> "proxy.process.cache.volume_0.scan.active": "0",
>> "proxy.process.cache.volume_0.scan.success": "0",
>> "proxy.process.cache.volume_0.scan.failure": "0",
>> "proxy.process.cache.volume_0.direntries.total": "239453928",
>> "proxy.process.cache.volume_0.direntries.used": "274305",
>> "proxy.process.cache.volume_0.directory_collision": "0",
>> "proxy.process.cache.volume_0.frags_per_doc.1": "228249",
>> "proxy.process.cache.volume_0.frags_per_doc.2": "0",
>> "proxy.process.cache.volume_0.frags_per_doc.3+": "10884",
>> "proxy.process.cache.volume_0.read_busy.success": "105",
>> "proxy.process.cache.volume_0.read_busy.failure": "52600",
>> "proxy.process.cache.volume_0.write_bytes_stat": "0",
>> "proxy.process.cache.volume_0.vector_marshals": "0",
>> "proxy.process.cache.volume_0.hdr_marshals": "0",
>> "proxy.process.cache.volume_0.hdr_marshal_bytes": "0",
>> "proxy.process.cache.volume_0.gc_bytes_evacuated": "0",
>> "proxy.process.cache.volume_0.gc_frags_evacuated": "0",
>> "proxy.process.cache.volume_0.wrap_count": "0",
>> "proxy.process.cache.volume_0.sync.count": "252",
>> "proxy.process.cache.volume_0.sync.bytes": "302145806336",
>> "proxy.process.cache.volume_0.sync.time": "72939367496484",
>> "proxy.process.cache.volume_0.span.errors.read": "0",
>> "proxy.process.cache.volume_0.span.errors.write": "0",
>> "proxy.process.cache.volume_0.span.failing": "0",
>> "proxy.process.cache.volume_0.span.offline": "0",
>> "proxy.process.cache.volume_0.span.online": "0",
>> "server": "8.0.7"
>> 
>>> On Tue, Apr 28, 2020 at 6:58 PM Bryan Call <bc...@apache.org> wrote:
>>> Here are some numbers from our production servers, from a couple different groups.  We are running a heavily modified version of 7.1.2
>>> 
>>> One of our production server for our CDN:
>>> proxy.process.cache.read.success 6037258
>>> proxy.process.cache.read.failure 13845799
>>> 
>>> Here is another one from another group:
>>> proxy.process.cache.read.success 5575072
>>> proxy.process.cache.read.failure 26784750
>>> 
>>> I talked to another company and they were seeing about a 3% failure rate on their cache and they are using 8.0.7.  I created an issue on this: https://github.com/apache/trafficserver/issues/6713
>>> 
>>> -Bryan
>>> 
>>> > On Apr 26, 2020, at 4:32 PM, edd! <me...@eddi.me> wrote:
>>> > 
>>> > Hi,
>>> > 
>>> > New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
>>> > # source /opt/rh/devtoolset-7/enable
>>> > # ./configure --enable-experimental-plugins
>>> > # make && make install
>>> > Testing it as a transparent forward proxy serving ~500 users with http caching enabled. I have first tried raw cache storage then volume file but in both cases I got so many read and few write failures on a 100 GB SSD partition.
>>> > 
>>> > "proxy.process.cache.volume_0.bytes_used": "3323351040",
>>> > "proxy.process.cache.volume_0.bytes_total": "106167836672",
>>> > "proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
>>> > "proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
>>> > "proxy.process.cache.volume_0.ram_cache.hits": "4916",
>>> > "proxy.process.cache.volume_0.ram_cache.misses": "1411",
>>> > "proxy.process.cache.volume_0.pread_count": "0",
>>> > "proxy.process.cache.volume_0.percent_full": "3",
>>> > "proxy.process.cache.volume_0.lookup.active": "0",
>>> > "proxy.process.cache.volume_0.lookup.success": "0",
>>> > "proxy.process.cache.volume_0.lookup.failure": "0",
>>> > "proxy.process.cache.volume_0.read.active": "1",
>>> > "proxy.process.cache.volume_0.read.success": "5566",
>>> > "proxy.process.cache.volume_0.read.failure": "22084",
>>> > "proxy.process.cache.volume_0.write.active": "8",
>>> > "proxy.process.cache.volume_0.write.success": "5918",
>>> > "proxy.process.cache.volume_0.write.failure": "568",
>>> > "proxy.process.cache.volume_0.write.backlog.failure": "272",
>>> > "proxy.process.cache.volume_0.update.active": "1",
>>> > "proxy.process.cache.volume_0.update.success": "306",
>>> > "proxy.process.cache.volume_0.update.failure": "4",
>>> > "proxy.process.cache.volume_0.remove.active": "0",
>>> > "proxy.process.cache.volume_0.remove.success": "0",
>>> > "proxy.process.cache.volume_0.remove.failure": "0",
>>> > "proxy.process.cache.volume_0.evacuate.active": "0",
>>> > "proxy.process.cache.volume_0.evacuate.success": "0",
>>> > "proxy.process.cache.volume_0.evacuate.failure": "0",
>>> > "proxy.process.cache.volume_0.scan.active": "0",
>>> > "proxy.process.cache.volume_0.scan.success": "0",
>>> > "proxy.process.cache.volume_0.scan.failure": "0",
>>> > "proxy.process.cache.volume_0.direntries.total": "13255088",
>>> > "proxy.process.cache.volume_0.direntries.used": "9230",
>>> > "proxy.process.cache.volume_0.directory_collision": "0",
>>> > "proxy.process.cache.volume_0.frags_per_doc.1": "5372",
>>> > "proxy.process.cache.volume_0.frags_per_doc.2": "0",
>>> > "proxy.process.cache.volume_0.frags_per_doc.3+": "625",
>>> > "proxy.process.cache.volume_0.read_busy.success": "4",
>>> > "proxy.process.cache.volume_0.read_busy.failure": "351",
>>> > 
>>> > Disk read/write tests:
>>> > hdparm -t /dev/sdb1
>>> > /dev/sdb1:
>>> >  Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec
>>> > 
>>> > hdparm -T /dev/sdb1
>>> > /dev/sdb1:
>>> >  Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec
>>> > 
>>> > dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync   
>>> > 1+0 records in
>>> > 1+0 records out
>>> > 1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s
>>> > 
>>> > dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync   
>>> > 1000+0 records in
>>> > 1000+0 records out
>>> > 512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s
>>> > 
>>> > Please help,
>>> > 
>>> > Thank you,
>>> > Eddi
>>> 

Re: cache read and write failures

Posted by Alan Carroll <so...@verizonmedia.com>.
Cache misses count as read failures, so you should expect a lot of that on
an empty cache. Write failures can be caused by clients giving up on the
transaction. So I'd take those numbers with a bit of caution. If actual
system level read/write failures happen more than 4 or 5 times in a row,
the disk is taken off line.

On Tue, Apr 28, 2020 at 11:35 AM edd! <me...@eddi.me> wrote:

> Interesting figures, especially that the failures seem to persist even on
> different major releases.
> I initially thought the failures existed because I am testing ATS in a VM.
> So, yesterday I installed the same OS and ATS release, compiled the source
> as provided, on a bare HP server with 2x240 GB SSD in RAID1 for the OS and
> 2 x 1TB each in its own volume (RAID0) for the cache storage... However, I
> still have failures (see below)
>
> "proxy.process.version.server.short": "8.0.7",
> "proxy.process.version.server.long": "Apache Traffic Server - traffic_server - 8.0.7 - (build # 042720 on Apr 27 2020 at 20:11:28)",
> "proxy.process.version.server.build_number": "042720",
> "proxy.process.version.server.build_time": "20:11:28",
> "proxy.process.version.server.build_date": "Apr 27 2020",
> "proxy.process.version.server.build_person": "root",
> "proxy.process.http.background_fill_current_count": "4",
> "proxy.process.http.current_client_connections": "640",
> "proxy.process.http.current_active_client_connections": "122",
> "proxy.process.http.websocket.current_active_client_connections": "0",
> "proxy.process.http.current_client_transactions": "261",
> "proxy.process.http.current_server_transactions": "184",
> "proxy.process.http.current_parent_proxy_connections": "0",
> "proxy.process.http.current_server_connections": "805",
> "proxy.process.http.current_cache_connections": "155",
>
> "proxy.process.cache.volume_0.bytes_used": "65715343360",
> "proxy.process.cache.volume_0.bytes_total": "1917932191744",
> "proxy.process.cache.volume_0.ram_cache.total_bytes": "25769803776",
> "proxy.process.cache.volume_0.ram_cache.bytes_used": "770314880",
> "proxy.process.cache.volume_0.ram_cache.hits": "541749",
> "proxy.process.cache.volume_0.ram_cache.misses": "115283",
> "proxy.process.cache.volume_0.pread_count": "0",
> "proxy.process.cache.volume_0.percent_full": "3",
> "proxy.process.cache.volume_0.lookup.active": "0",
> "proxy.process.cache.volume_0.lookup.success": "0",
> "proxy.process.cache.volume_0.lookup.failure": "0",
> "proxy.process.cache.volume_0.read.active": "4",
> "proxy.process.cache.volume_0.read.success": "525727","proxy.process.cache.volume_0.read.failure": "1290873",
> "proxy.process.cache.volume_0.write.active": "153",
> "proxy.process.cache.volume_0.write.success": "222118","proxy.process.cache.volume_0.write.failure": "42813",
> "proxy.process.cache.volume_0.write.backlog.failure": "0",
> "proxy.process.cache.volume_0.update.active": "1",
> "proxy.process.cache.volume_0.update.success": "33634","proxy.process.cache.volume_0.update.failure": "533",
> "proxy.process.cache.volume_0.remove.active": "0",
> "proxy.process.cache.volume_0.remove.success": "0",
> "proxy.process.cache.volume_0.remove.failure": "0",
> "proxy.process.cache.volume_0.evacuate.active": "0",
> "proxy.process.cache.volume_0.evacuate.success": "0",
> "proxy.process.cache.volume_0.evacuate.failure": "0",
> "proxy.process.cache.volume_0.scan.active": "0",
> "proxy.process.cache.volume_0.scan.success": "0",
> "proxy.process.cache.volume_0.scan.failure": "0",
> "proxy.process.cache.volume_0.direntries.total": "239453928",
> "proxy.process.cache.volume_0.direntries.used": "274305",
> "proxy.process.cache.volume_0.directory_collision": "0",
> "proxy.process.cache.volume_0.frags_per_doc.1": "228249",
> "proxy.process.cache.volume_0.frags_per_doc.2": "0",
> "proxy.process.cache.volume_0.frags_per_doc.3+": "10884",
> "proxy.process.cache.volume_0.read_busy.success": "105","proxy.process.cache.volume_0.read_busy.failure": "52600",
> "proxy.process.cache.volume_0.write_bytes_stat": "0",
> "proxy.process.cache.volume_0.vector_marshals": "0",
> "proxy.process.cache.volume_0.hdr_marshals": "0",
> "proxy.process.cache.volume_0.hdr_marshal_bytes": "0",
> "proxy.process.cache.volume_0.gc_bytes_evacuated": "0",
> "proxy.process.cache.volume_0.gc_frags_evacuated": "0",
> "proxy.process.cache.volume_0.wrap_count": "0",
> "proxy.process.cache.volume_0.sync.count": "252",
> "proxy.process.cache.volume_0.sync.bytes": "302145806336",
> "proxy.process.cache.volume_0.sync.time": "72939367496484",
> "proxy.process.cache.volume_0.span.errors.read": "0",
> "proxy.process.cache.volume_0.span.errors.write": "0",
> "proxy.process.cache.volume_0.span.failing": "0",
> "proxy.process.cache.volume_0.span.offline": "0",
> "proxy.process.cache.volume_0.span.online": "0",
> "server": "8.0.7"
>
>
> On Tue, Apr 28, 2020 at 6:58 PM Bryan Call <bc...@apache.org> wrote:
>
>> Here are some numbers from our production servers, from a couple
>> different groups.  We are running a heavily modified version of 7.1.2
>>
>> One of our production server for our CDN:
>> proxy.process.cache.read.success 6037258
>> proxy.process.cache.read.failure 13845799
>>
>> Here is another one from another group:
>> proxy.process.cache.read.success 5575072
>> proxy.process.cache.read.failure 26784750
>>
>> I talked to another company and they were seeing about a 3% failure rate
>> on their cache and they are using 8.0.7.  I created an issue on this:
>> https://github.com/apache/trafficserver/issues/6713
>>
>> -Bryan
>>
>> > On Apr 26, 2020, at 4:32 PM, edd! <me...@eddi.me> wrote:
>> >
>> > Hi,
>> >
>> > New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
>> > # source /opt/rh/devtoolset-7/enable
>> > # ./configure --enable-experimental-plugins
>> > # make && make install
>> > Testing it as a transparent forward proxy serving ~500 users with http
>> caching enabled. I have first tried raw cache storage then volume file but
>> in both cases I got so many read and few write failures on a 100 GB SSD
>> partition.
>> >
>> > "proxy.process.cache.volume_0.bytes_used": "3323351040",
>> > "proxy.process.cache.volume_0.bytes_total": "106167836672",
>> > "proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
>> > "proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
>> > "proxy.process.cache.volume_0.ram_cache.hits": "4916",
>> > "proxy.process.cache.volume_0.ram_cache.misses": "1411",
>> > "proxy.process.cache.volume_0.pread_count": "0",
>> > "proxy.process.cache.volume_0.percent_full": "3",
>> > "proxy.process.cache.volume_0.lookup.active": "0",
>> > "proxy.process.cache.volume_0.lookup.success": "0",
>> > "proxy.process.cache.volume_0.lookup.failure": "0",
>> > "proxy.process.cache.volume_0.read.active": "1",
>> > "proxy.process.cache.volume_0.read.success": "5566",
>> > "proxy.process.cache.volume_0.read.failure": "22084",
>> > "proxy.process.cache.volume_0.write.active": "8",
>> > "proxy.process.cache.volume_0.write.success": "5918",
>> > "proxy.process.cache.volume_0.write.failure": "568",
>> > "proxy.process.cache.volume_0.write.backlog.failure": "272",
>> > "proxy.process.cache.volume_0.update.active": "1",
>> > "proxy.process.cache.volume_0.update.success": "306",
>> > "proxy.process.cache.volume_0.update.failure": "4",
>> > "proxy.process.cache.volume_0.remove.active": "0",
>> > "proxy.process.cache.volume_0.remove.success": "0",
>> > "proxy.process.cache.volume_0.remove.failure": "0",
>> > "proxy.process.cache.volume_0.evacuate.active": "0",
>> > "proxy.process.cache.volume_0.evacuate.success": "0",
>> > "proxy.process.cache.volume_0.evacuate.failure": "0",
>> > "proxy.process.cache.volume_0.scan.active": "0",
>> > "proxy.process.cache.volume_0.scan.success": "0",
>> > "proxy.process.cache.volume_0.scan.failure": "0",
>> > "proxy.process.cache.volume_0.direntries.total": "13255088",
>> > "proxy.process.cache.volume_0.direntries.used": "9230",
>> > "proxy.process.cache.volume_0.directory_collision": "0",
>> > "proxy.process.cache.volume_0.frags_per_doc.1": "5372",
>> > "proxy.process.cache.volume_0.frags_per_doc.2": "0",
>> > "proxy.process.cache.volume_0.frags_per_doc.3+": "625",
>> > "proxy.process.cache.volume_0.read_busy.success": "4",
>> > "proxy.process.cache.volume_0.read_busy.failure": "351",
>> >
>> > Disk read/write tests:
>> > hdparm -t /dev/sdb1
>> > /dev/sdb1:
>> >  Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec
>> >
>> > hdparm -T /dev/sdb1
>> > /dev/sdb1:
>> >  Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec
>> >
>> > dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync
>> > 1+0 records in
>> > 1+0 records out
>> > 1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s
>> >
>> > dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync
>> > 1000+0 records in
>> > 1000+0 records out
>> > 512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s
>> >
>> > Please help,
>> >
>> > Thank you,
>> > Eddi
>>
>>

Re: cache read and write failures

Posted by edd! <me...@eddi.me>.
Interesting figures, especially that the failures seem to persist even on
different major releases.
I initially thought the failures existed because I am testing ATS in a VM.
So, yesterday I installed the same OS and ATS release, compiled the source
as provided, on a bare HP server with 2x240 GB SSD in RAID1 for the OS and
2 x 1TB each in its own volume (RAID0) for the cache storage... However, I
still have failures (see below)

"proxy.process.version.server.short": "8.0.7",
"proxy.process.version.server.long": "Apache Traffic Server -
traffic_server - 8.0.7 - (build # 042720 on Apr 27 2020 at 20:11:28)",
"proxy.process.version.server.build_number": "042720",
"proxy.process.version.server.build_time": "20:11:28",
"proxy.process.version.server.build_date": "Apr 27 2020",
"proxy.process.version.server.build_person": "root",
"proxy.process.http.background_fill_current_count": "4",
"proxy.process.http.current_client_connections": "640",
"proxy.process.http.current_active_client_connections": "122",
"proxy.process.http.websocket.current_active_client_connections": "0",
"proxy.process.http.current_client_transactions": "261",
"proxy.process.http.current_server_transactions": "184",
"proxy.process.http.current_parent_proxy_connections": "0",
"proxy.process.http.current_server_connections": "805",
"proxy.process.http.current_cache_connections": "155",

"proxy.process.cache.volume_0.bytes_used": "65715343360",
"proxy.process.cache.volume_0.bytes_total": "1917932191744",
"proxy.process.cache.volume_0.ram_cache.total_bytes": "25769803776",
"proxy.process.cache.volume_0.ram_cache.bytes_used": "770314880",
"proxy.process.cache.volume_0.ram_cache.hits": "541749",
"proxy.process.cache.volume_0.ram_cache.misses": "115283",
"proxy.process.cache.volume_0.pread_count": "0",
"proxy.process.cache.volume_0.percent_full": "3",
"proxy.process.cache.volume_0.lookup.active": "0",
"proxy.process.cache.volume_0.lookup.success": "0",
"proxy.process.cache.volume_0.lookup.failure": "0",
"proxy.process.cache.volume_0.read.active": "4",
"proxy.process.cache.volume_0.read.success":
"525727","proxy.process.cache.volume_0.read.failure": "1290873",
"proxy.process.cache.volume_0.write.active": "153",
"proxy.process.cache.volume_0.write.success":
"222118","proxy.process.cache.volume_0.write.failure": "42813",
"proxy.process.cache.volume_0.write.backlog.failure": "0",
"proxy.process.cache.volume_0.update.active": "1",
"proxy.process.cache.volume_0.update.success":
"33634","proxy.process.cache.volume_0.update.failure": "533",
"proxy.process.cache.volume_0.remove.active": "0",
"proxy.process.cache.volume_0.remove.success": "0",
"proxy.process.cache.volume_0.remove.failure": "0",
"proxy.process.cache.volume_0.evacuate.active": "0",
"proxy.process.cache.volume_0.evacuate.success": "0",
"proxy.process.cache.volume_0.evacuate.failure": "0",
"proxy.process.cache.volume_0.scan.active": "0",
"proxy.process.cache.volume_0.scan.success": "0",
"proxy.process.cache.volume_0.scan.failure": "0",
"proxy.process.cache.volume_0.direntries.total": "239453928",
"proxy.process.cache.volume_0.direntries.used": "274305",
"proxy.process.cache.volume_0.directory_collision": "0",
"proxy.process.cache.volume_0.frags_per_doc.1": "228249",
"proxy.process.cache.volume_0.frags_per_doc.2": "0",
"proxy.process.cache.volume_0.frags_per_doc.3+": "10884",
"proxy.process.cache.volume_0.read_busy.success":
"105","proxy.process.cache.volume_0.read_busy.failure": "52600",
"proxy.process.cache.volume_0.write_bytes_stat": "0",
"proxy.process.cache.volume_0.vector_marshals": "0",
"proxy.process.cache.volume_0.hdr_marshals": "0",
"proxy.process.cache.volume_0.hdr_marshal_bytes": "0",
"proxy.process.cache.volume_0.gc_bytes_evacuated": "0",
"proxy.process.cache.volume_0.gc_frags_evacuated": "0",
"proxy.process.cache.volume_0.wrap_count": "0",
"proxy.process.cache.volume_0.sync.count": "252",
"proxy.process.cache.volume_0.sync.bytes": "302145806336",
"proxy.process.cache.volume_0.sync.time": "72939367496484",
"proxy.process.cache.volume_0.span.errors.read": "0",
"proxy.process.cache.volume_0.span.errors.write": "0",
"proxy.process.cache.volume_0.span.failing": "0",
"proxy.process.cache.volume_0.span.offline": "0",
"proxy.process.cache.volume_0.span.online": "0",
"server": "8.0.7"


On Tue, Apr 28, 2020 at 6:58 PM Bryan Call <bc...@apache.org> wrote:

> Here are some numbers from our production servers, from a couple different
> groups.  We are running a heavily modified version of 7.1.2
>
> One of our production server for our CDN:
> proxy.process.cache.read.success 6037258
> proxy.process.cache.read.failure 13845799
>
> Here is another one from another group:
> proxy.process.cache.read.success 5575072
> proxy.process.cache.read.failure 26784750
>
> I talked to another company and they were seeing about a 3% failure rate
> on their cache and they are using 8.0.7.  I created an issue on this:
> https://github.com/apache/trafficserver/issues/6713
>
> -Bryan
>
> > On Apr 26, 2020, at 4:32 PM, edd! <me...@eddi.me> wrote:
> >
> > Hi,
> >
> > New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
> > # source /opt/rh/devtoolset-7/enable
> > # ./configure --enable-experimental-plugins
> > # make && make install
> > Testing it as a transparent forward proxy serving ~500 users with http
> caching enabled. I have first tried raw cache storage then volume file but
> in both cases I got so many read and few write failures on a 100 GB SSD
> partition.
> >
> > "proxy.process.cache.volume_0.bytes_used": "3323351040",
> > "proxy.process.cache.volume_0.bytes_total": "106167836672",
> > "proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
> > "proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
> > "proxy.process.cache.volume_0.ram_cache.hits": "4916",
> > "proxy.process.cache.volume_0.ram_cache.misses": "1411",
> > "proxy.process.cache.volume_0.pread_count": "0",
> > "proxy.process.cache.volume_0.percent_full": "3",
> > "proxy.process.cache.volume_0.lookup.active": "0",
> > "proxy.process.cache.volume_0.lookup.success": "0",
> > "proxy.process.cache.volume_0.lookup.failure": "0",
> > "proxy.process.cache.volume_0.read.active": "1",
> > "proxy.process.cache.volume_0.read.success": "5566",
> > "proxy.process.cache.volume_0.read.failure": "22084",
> > "proxy.process.cache.volume_0.write.active": "8",
> > "proxy.process.cache.volume_0.write.success": "5918",
> > "proxy.process.cache.volume_0.write.failure": "568",
> > "proxy.process.cache.volume_0.write.backlog.failure": "272",
> > "proxy.process.cache.volume_0.update.active": "1",
> > "proxy.process.cache.volume_0.update.success": "306",
> > "proxy.process.cache.volume_0.update.failure": "4",
> > "proxy.process.cache.volume_0.remove.active": "0",
> > "proxy.process.cache.volume_0.remove.success": "0",
> > "proxy.process.cache.volume_0.remove.failure": "0",
> > "proxy.process.cache.volume_0.evacuate.active": "0",
> > "proxy.process.cache.volume_0.evacuate.success": "0",
> > "proxy.process.cache.volume_0.evacuate.failure": "0",
> > "proxy.process.cache.volume_0.scan.active": "0",
> > "proxy.process.cache.volume_0.scan.success": "0",
> > "proxy.process.cache.volume_0.scan.failure": "0",
> > "proxy.process.cache.volume_0.direntries.total": "13255088",
> > "proxy.process.cache.volume_0.direntries.used": "9230",
> > "proxy.process.cache.volume_0.directory_collision": "0",
> > "proxy.process.cache.volume_0.frags_per_doc.1": "5372",
> > "proxy.process.cache.volume_0.frags_per_doc.2": "0",
> > "proxy.process.cache.volume_0.frags_per_doc.3+": "625",
> > "proxy.process.cache.volume_0.read_busy.success": "4",
> > "proxy.process.cache.volume_0.read_busy.failure": "351",
> >
> > Disk read/write tests:
> > hdparm -t /dev/sdb1
> > /dev/sdb1:
> >  Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec
> >
> > hdparm -T /dev/sdb1
> > /dev/sdb1:
> >  Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec
> >
> > dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync
> > 1+0 records in
> > 1+0 records out
> > 1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s
> >
> > dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync
> > 1000+0 records in
> > 1000+0 records out
> > 512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s
> >
> > Please help,
> >
> > Thank you,
> > Eddi
>
>

Re: cache read and write failures

Posted by Bryan Call <bc...@apache.org>.
Here are some numbers from our production servers, from a couple different groups.  We are running a heavily modified version of 7.1.2

One of our production server for our CDN:
proxy.process.cache.read.success 6037258
proxy.process.cache.read.failure 13845799

Here is another one from another group:
proxy.process.cache.read.success 5575072
proxy.process.cache.read.failure 26784750

I talked to another company and they were seeing about a 3% failure rate on their cache and they are using 8.0.7.  I created an issue on this: https://github.com/apache/trafficserver/issues/6713

-Bryan

> On Apr 26, 2020, at 4:32 PM, edd! <me...@eddi.me> wrote:
> 
> Hi,
> 
> New to ATS, I compiled 8.0.7 from source yesterday on CentOS 7
> # source /opt/rh/devtoolset-7/enable
> # ./configure --enable-experimental-plugins
> # make && make install
> Testing it as a transparent forward proxy serving ~500 users with http caching enabled. I have first tried raw cache storage then volume file but in both cases I got so many read and few write failures on a 100 GB SSD partition.
> 
> "proxy.process.cache.volume_0.bytes_used": "3323351040",
> "proxy.process.cache.volume_0.bytes_total": "106167836672",
> "proxy.process.cache.volume_0.ram_cache.total_bytes": "12884901888",
> "proxy.process.cache.volume_0.ram_cache.bytes_used": "6062080",
> "proxy.process.cache.volume_0.ram_cache.hits": "4916",
> "proxy.process.cache.volume_0.ram_cache.misses": "1411",
> "proxy.process.cache.volume_0.pread_count": "0",
> "proxy.process.cache.volume_0.percent_full": "3",
> "proxy.process.cache.volume_0.lookup.active": "0",
> "proxy.process.cache.volume_0.lookup.success": "0",
> "proxy.process.cache.volume_0.lookup.failure": "0",
> "proxy.process.cache.volume_0.read.active": "1",
> "proxy.process.cache.volume_0.read.success": "5566",
> "proxy.process.cache.volume_0.read.failure": "22084",
> "proxy.process.cache.volume_0.write.active": "8",
> "proxy.process.cache.volume_0.write.success": "5918",
> "proxy.process.cache.volume_0.write.failure": "568",
> "proxy.process.cache.volume_0.write.backlog.failure": "272",
> "proxy.process.cache.volume_0.update.active": "1",
> "proxy.process.cache.volume_0.update.success": "306",
> "proxy.process.cache.volume_0.update.failure": "4",
> "proxy.process.cache.volume_0.remove.active": "0",
> "proxy.process.cache.volume_0.remove.success": "0",
> "proxy.process.cache.volume_0.remove.failure": "0",
> "proxy.process.cache.volume_0.evacuate.active": "0",
> "proxy.process.cache.volume_0.evacuate.success": "0",
> "proxy.process.cache.volume_0.evacuate.failure": "0",
> "proxy.process.cache.volume_0.scan.active": "0",
> "proxy.process.cache.volume_0.scan.success": "0",
> "proxy.process.cache.volume_0.scan.failure": "0",
> "proxy.process.cache.volume_0.direntries.total": "13255088",
> "proxy.process.cache.volume_0.direntries.used": "9230",
> "proxy.process.cache.volume_0.directory_collision": "0",
> "proxy.process.cache.volume_0.frags_per_doc.1": "5372",
> "proxy.process.cache.volume_0.frags_per_doc.2": "0",
> "proxy.process.cache.volume_0.frags_per_doc.3+": "625",
> "proxy.process.cache.volume_0.read_busy.success": "4",
> "proxy.process.cache.volume_0.read_busy.failure": "351",
> 
> Disk read/write tests:
> hdparm -t /dev/sdb1
> /dev/sdb1:
>  Timing buffered disk reads: 196 MB in  3.24 seconds =  60.54 MB/sec
> 
> hdparm -T /dev/sdb1
> /dev/sdb1:
>  Timing cached reads:   11662 MB in  1.99 seconds = 5863.27 MB/sec
> 
> dd if=/dev/zero of=/cache/test1.img bs=1G count=1 oflag=dsync   
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 67.6976 s, 15.9 MB/s
> 
> dd if=/dev/zero of=/cache/test2.img bs=512 count=1000 oflag=dsync   
> 1000+0 records in
> 1000+0 records out
> 512000 bytes (512 kB) copied, 0.374173 s, 1.4 MB/s
> 
> Please help,
> 
> Thank you,
> Eddi


Re: stats_over_http.so is causing a loop

Posted by Alan Carroll <so...@verizonmedia.com>.
It is expected that ATS would loop, as the address you put in the browser
is where ATS will try to connect. If that's the ATS address, then it will
connect to itself and loop. With a correct configured stats_over_http, the
plugin prevents ATS from connecting outbound, preventing the loop. Instead
the plugin directly serves the response. Reading the documentation, it
looks to me that you should configure just the plugin and not have a remap
rule for it.

On Mon, May 11, 2020 at 11:53 AM edd! <me...@eddi.me> wrote:

> I removed the stats_over_http from the plugin.config and removed the map
> rule but still, whenever I enter the trafficserver ip in a web browser it
> loops endlessly.
> So it has nothing to do with the stats_over_http plugin
>
> On Mon, May 11, 2020 at 7:31 PM Alan Carroll <
> solidwallofcode@verizonmedia.com> wrote:
>
>> I don't understand your remap rule. According to the documentation,
>> there's no need to do a remap at all. Even if needed, I think the path
>> "[cache}/statz" isn't going to work, as the "{}" paths are special and
>> don't generally have additional path information. If you leave out the
>> remap rule, does it still work?
>>
>> On Sat, May 9, 2020 at 4:34 PM edd! <me...@eddi.me> wrote:
>>
>>> Another interesting and dangerous addition to the above, I have removed
>>> the plugin and the map then requested the ATS server ip in a browser
>>> http://10.0.34.1, the loop occurred as well
>>>
>>> On Sun, May 10, 2020 at 12:25 AM edd! <me...@eddi.me> wrote:
>>>
>>>> Hi,
>>>>
>>>> On ATS 8.0.7 stats_over_http.so is causing a loop; here's the config:
>>>> plugin.config: stats_over_http.so statz
>>>> remap.config: map http://10.0.34.1:80/statz http://{cache}/statz
>>>> @action=allow @src_ip=10.10.10.10
>>>>
>>>> As soon as I request the page I get countless errors in access.log
>>>> 10.0.34.1 | ERR_CONNECT_FAIL | NONE | MISS | WL_MISS | 502 | 478 | GET
>>>> | http://10.0.34.1/favicon.ico | - | DIRECT/10.0.34.1 | text/html
>>>> 10.0.34.1 | TCP_MISS | NONE | MISS | WL_MISS | 403 | 442 | GET |
>>>> http://10.0.34.1/statiztikz | - | DIRECT/10.0.34.1 | text/html
>>>>
>>>> and in diags.log:
>>>> [ACCEPT 0:80] WARNING: too many connections, throttling.
>>>>  connection_type=ACCEPT, current_connections=30002,
>>>> net_connections_throttle=30000
>>>>
>>>> Only restarting trafficserver stops this loop.
>>>>
>>>> Thank you,
>>>> Eddi
>>>>
>>>>>

Re: stats_over_http.so is causing a loop

Posted by edd! <me...@eddi.me>.
I removed the stats_over_http from the plugin.config and removed the map
rule but still, whenever I enter the trafficserver ip in a web browser it
loops endlessly.
So it has nothing to do with the stats_over_http plugin

On Mon, May 11, 2020 at 7:31 PM Alan Carroll <
solidwallofcode@verizonmedia.com> wrote:

> I don't understand your remap rule. According to the documentation,
> there's no need to do a remap at all. Even if needed, I think the path
> "[cache}/statz" isn't going to work, as the "{}" paths are special and
> don't generally have additional path information. If you leave out the
> remap rule, does it still work?
>
> On Sat, May 9, 2020 at 4:34 PM edd! <me...@eddi.me> wrote:
>
>> Another interesting and dangerous addition to the above, I have removed
>> the plugin and the map then requested the ATS server ip in a browser
>> http://10.0.34.1, the loop occurred as well
>>
>> On Sun, May 10, 2020 at 12:25 AM edd! <me...@eddi.me> wrote:
>>
>>> Hi,
>>>
>>> On ATS 8.0.7 stats_over_http.so is causing a loop; here's the config:
>>> plugin.config: stats_over_http.so statz
>>> remap.config: map http://10.0.34.1:80/statz http://{cache}/statz
>>> @action=allow @src_ip=10.10.10.10
>>>
>>> As soon as I request the page I get countless errors in access.log
>>> 10.0.34.1 | ERR_CONNECT_FAIL | NONE | MISS | WL_MISS | 502 | 478 | GET |
>>> http://10.0.34.1/favicon.ico | - | DIRECT/10.0.34.1 | text/html
>>> 10.0.34.1 | TCP_MISS | NONE | MISS | WL_MISS | 403 | 442 | GET |
>>> http://10.0.34.1/statiztikz | - | DIRECT/10.0.34.1 | text/html
>>>
>>> and in diags.log:
>>> [ACCEPT 0:80] WARNING: too many connections, throttling.
>>>  connection_type=ACCEPT, current_connections=30002,
>>> net_connections_throttle=30000
>>>
>>> Only restarting trafficserver stops this loop.
>>>
>>> Thank you,
>>> Eddi
>>>
>>>>

Re: stats_over_http.so is causing a loop

Posted by Alan Carroll <so...@verizonmedia.com>.
I don't understand your remap rule. According to the documentation, there's
no need to do a remap at all. Even if needed, I think the path
"[cache}/statz" isn't going to work, as the "{}" paths are special and
don't generally have additional path information. If you leave out the
remap rule, does it still work?

On Sat, May 9, 2020 at 4:34 PM edd! <me...@eddi.me> wrote:

> Another interesting and dangerous addition to the above, I have removed
> the plugin and the map then requested the ATS server ip in a browser
> http://10.0.34.1, the loop occurred as well
>
> On Sun, May 10, 2020 at 12:25 AM edd! <me...@eddi.me> wrote:
>
>> Hi,
>>
>> On ATS 8.0.7 stats_over_http.so is causing a loop; here's the config:
>> plugin.config: stats_over_http.so statz
>> remap.config: map http://10.0.34.1:80/statz http://{cache}/statz
>> @action=allow @src_ip=10.10.10.10
>>
>> As soon as I request the page I get countless errors in access.log
>> 10.0.34.1 | ERR_CONNECT_FAIL | NONE | MISS | WL_MISS | 502 | 478 | GET |
>> http://10.0.34.1/favicon.ico | - | DIRECT/10.0.34.1 | text/html
>> 10.0.34.1 | TCP_MISS | NONE | MISS | WL_MISS | 403 | 442 | GET |
>> http://10.0.34.1/statiztikz | - | DIRECT/10.0.34.1 | text/html
>>
>> and in diags.log:
>> [ACCEPT 0:80] WARNING: too many connections, throttling.
>>  connection_type=ACCEPT, current_connections=30002,
>> net_connections_throttle=30000
>>
>> Only restarting trafficserver stops this loop.
>>
>> Thank you,
>> Eddi
>>
>>>

Re: stats_over_http.so is causing a loop

Posted by edd! <me...@eddi.me>.
Another interesting and dangerous addition to the above, I have removed the
plugin and the map then requested the ATS server ip in a browser
http://10.0.34.1, the loop occurred as well

On Sun, May 10, 2020 at 12:25 AM edd! <me...@eddi.me> wrote:

> Hi,
>
> On ATS 8.0.7 stats_over_http.so is causing a loop; here's the config:
> plugin.config: stats_over_http.so statz
> remap.config: map http://10.0.34.1:80/statz http://{cache}/statz
> @action=allow @src_ip=10.10.10.10
>
> As soon as I request the page I get countless errors in access.log
> 10.0.34.1 | ERR_CONNECT_FAIL | NONE | MISS | WL_MISS | 502 | 478 | GET |
> http://10.0.34.1/favicon.ico | - | DIRECT/10.0.34.1 | text/html
> 10.0.34.1 | TCP_MISS | NONE | MISS | WL_MISS | 403 | 442 | GET |
> http://10.0.34.1/statiztikz | - | DIRECT/10.0.34.1 | text/html
>
> and in diags.log:
> [ACCEPT 0:80] WARNING: too many connections, throttling.
>  connection_type=ACCEPT, current_connections=30002,
> net_connections_throttle=30000
>
> Only restarting trafficserver stops this loop.
>
> Thank you,
> Eddi
>
>>

stats_over_http.so is causing a loop

Posted by edd! <me...@eddi.me>.
Hi,

On ATS 8.0.7 stats_over_http.so is causing a loop; here's the config:
plugin.config: stats_over_http.so statz
remap.config: map http://10.0.34.1:80/statz http://{cache}/statz
@action=allow @src_ip=10.10.10.10

As soon as I request the page I get countless errors in access.log
10.0.34.1 | ERR_CONNECT_FAIL | NONE | MISS | WL_MISS | 502 | 478 | GET |
http://10.0.34.1/favicon.ico | - | DIRECT/10.0.34.1 | text/html
10.0.34.1 | TCP_MISS | NONE | MISS | WL_MISS | 403 | 442 | GET |
http://10.0.34.1/statiztikz | - | DIRECT/10.0.34.1 | text/html

and in diags.log:
[ACCEPT 0:80] WARNING: too many connections, throttling.
 connection_type=ACCEPT, current_connections=30002,
net_connections_throttle=30000

Only restarting trafficserver stops this loop.

Thank you,
Eddi

>