You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/05/14 00:44:15 UTC

[GitHub] [trafficserver] moseleymark commented on issue #7471: ATS 9.0.1: current_server_connections, current_server_transactions metrics broken

moseleymark commented on issue #7471:
URL: https://github.com/apache/trafficserver/issues/7471#issuecomment-840919253


   I've been running 9.0.1 for about 24 hours and seeing this as well. The internal connection count seems to just tick up and up, but the connections don't seem to exist. Running traffic_ctl over and over, I do see it go *down* at times but the overall trend is to grow and grow.
   
   15867 is the pid of traffic_server
   
   # lsof -n -p 15867 | wc -l
   106540
   
   # traffic_ctl metric get proxy.process.net.connections_currently_open
   proxy.process.net.connections_currently_open 108760
   
   # traffic_ctl metric get proxy.process.http.current_server_connections
   proxy.process.http.current_server_connections 160043
   
   # traffic_ctl metric get proxy.process.current_server_connections
   proxy.process.current_server_connections 159993
   
   # We don't use parents so this is normal...
   # traffic_ctl metric get proxy.process.http.current_parent_proxy_connections
   proxy.process.http.current_parent_proxy_connections 0
   
   # netstat -ant | wc -l
   25203
   
   # netstat -antp | grep 15867 | wc -l
   8640
   
   So somehow ATS thinks it has 4x the number of connections that the kernel thinks the entire box has and 12x the # of connections that the kernel thinks ATS has.
   
   I'm also seeing a ton more CLOSE_WAITs compared to 8.1.1:
   
   # netstat -ant | grep CLOSE | wc -l
   4462
   
   On the other boxes in the same 3-node cluster (servicing the same VIPs and same backend origin sites, so extremely similar workload):
   
   # netstat -ant | grep CLOSE_WAIT | wc -l
   39
   
   # netstat -ant | grep CLOSE_WAIT | wc -l
   148
   
   All my testing looked good (the throughput is in-line with the other 2 nodes in the cluster; sites load ok; etc), and I wouldn't have even noticed this except that we bumped into the connection limit:
   
   [May 13 10:01:50.185] [ACCEPT 0:8080] WARNING: too many connections, throttling.  connection_type=ACCEPT, current_connections=262149, net_connections_throttle=262144
   [May 13 10:11:50.188] [ACCEPT 0:8443] WARNING: too many connections, throttling.  connection_type=ACCEPT, current_connections=262169, net_connections_throttle=262144
   [May 13 10:21:50.197] [ACCEPT 0:8443] WARNING: too many connections, throttling.  connection_type=ACCEPT, current_connections=262263, net_connections_throttle=262144
   [May 13 10:31:50.198] [ACCEPT 0:8443] WARNING: too many connections, throttling.  connection_type=ACCEPT, current_connections=262208, net_connections_throttle=262144
   [May 13 10:41:50.199] [ACCEPT 0:8080] WARNING: too many connections, throttling.  connection_type=ACCEPT, current_connections=262147, net_connections_throttle=262144
   
   I raised proxy.config.net.connections_throttle but the current count seems to creep up and up. I was scouring the upgrade docs to see if I missed something (which could still be the case) and came upon this ticket, so I figured I'd add some info.
   
   No idea if it's related to this but in 9.0.1, I also see a steady of things like:
   
   20210513.20h26m36s BODY_FACTORY: suppressing 'connect#failed_connect' response for url 'X@'
   
   and
   
   20210513.20h26m41s BODY_FACTORY: suppressing 'connect#failed_connect' response for url ''
   
   My 8.1.1 nodes (same VIP, same workload, same backend origin sites) don't have a single equivalent log entry. They of course have entries for other URLs, but always for valid-ish looking URLs, never '' nor 'X@'. I've yet to find a corresponding log entry for the nginx instance that's fronting ATS. Again, I just mention this in the off chance that it's related to the above connection count issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org