You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/04/16 14:44:55 UTC

[GitHub] [trafficserver] smalenfant opened a new issue #7710: connect_attempts_timeout not observed

smalenfant opened a new issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710


   After we upgraded from ATS 7.1.x to ATS 8.1.x, our backbone team went into several maintenances and noted impact on our CDN delivery. After the 3rd night of it, we finally found that something was wrong with the new installed code.
   
   ```
   # cat /opt/trafficserver/etc/trafficserver/records.config | grep connect_attempts_timeout
   CONFIG proxy.config.http.connect_attempts_timeout INT 2
   CONFIG proxy.config.http.parent_proxy.connect_attempts_timeout INT 2
   ```
   
   To make it short, the connect_attempts_timeout (at least on parent selection) or not observed under 2 scenarios:
   
   - If ICMP comes back with "No Route to Host". This might impact the timeouts. We've seen under ARP failures that the timeout could be from 3 to 6 seconds (single to dual stack).
   - If a host can't connect (Example: iptables DROP on port 80), the timeouts observed were 60 seconds. 
   
   Here's the results for ATS7 (7.1.1.10 to be specific) - Expected results within 4 seconds (OK):
   
   ```
   $ curl -I 'http://cdn1cdedge0002.hidden.cdn1.coxlab.net/test.mpg' -s -o /dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup}  TTFB: %{time_starttransfer} Total time: %{time_total}"
   Connect: 1.005 DNS: 0.004  TTFB: 3.984 Total time: 3.984
   
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) In ParentConfigParams::findParent(): parent_table: 0x1ac6050.
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) policy.ParentEnable: 1
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched with 0x1be6c08 parent node from line 67
   [Apr 16 13:46:46.661] Server {0x2b28de030700} DEBUG: (parent_select) Matched with 0x1be6158 parent node from line 29
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy.
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Chosen parent: cdn1cdmid0003.coxlab.net.80
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) PARENT_SPECIFIED
   [Apr 16 13:46:46.662] Server {0x2b28de030700} DEBUG: (parent_select) Result for origin.rd.at.cox.net was parent cdn1cdmid0003.coxlab.net:80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Starting ParentConsistentHash::markParentDown()
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Parent fail count increased to 1 for cdn1cdmid0003.coxlab.net:80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1ac6050, result->rec: 0x1be6158
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1ac6050
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Calling selectParent() from nextParent
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy.
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80
   [Apr 16 13:46:49.629] Server {0x2b28de030700} DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80
   ```
   
   Here's the results for ATS8 (8.1.2 to be specific) - Expected results within 4 seconds (NOT OK - took 60 seconds):
   
   ```
   $ curl -I 'http://cdn1cdedge0004.hidden.cdn1.coxlab.net/test.mpg' -s -o /dev/null -w "Connect: %{time_connect} DNS: %{time_namelookup}  TTFB: %{time_starttransfer} Total time: %{time_total}"
   Connect: 0.005 DNS: 0.004  TTFB: 60.567 Total time: 60.567
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) In ParentConfigParams::findParent(): parent_table: 0x1934050.
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) policy.ParentEnable: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7e140 parent node from line 45
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Matched with 0x1a7d498 parent node from line 18
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy.
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0003.coxlab.net.80
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) PARENT_SPECIFIED
   Apr 16 13:44:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:44:37.160] [ET_NET 19] DEBUG: (parent_select) Result for origin.rd.at.cox.net was parent cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Parent fail count increased to 1 for cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1934050, result->rec: 0x1a7d498
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1934050
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Calling selectParent() from nextParent
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy.
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80
   Apr 16 13:45:28 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:28.719] [ET_NET 0] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Parent fail count increased to 2 for cdn1cdmid0003.coxlab.net:80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): parent_table: 0x1934050, result->rec: 0x1a7d498
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConfigParams::nextParent(): result->r: 2, tablePtr: 0x1934050
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Calling selectParent() from nextParent
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) ParentConsistentHash::selectParent(): Using a consistent hash parent selection strategy.
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Initial parent lookups: 1
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Additional parent lookups: 1
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Chosen parent: cdn1cdmid0004.coxlab.net.80
   Apr 16 13:45:37 cdn1cdedge0004 traffic_manager: [Apr 16 13:45:37.714] [ET_NET 19] DEBUG: (parent_select) Retry result for origin.rd.at.cox.net was parent cdn1cdmid0004.coxlab.net:80
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall commented on issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
bryancall commented on issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710#issuecomment-1061297112


   This issue doesn't look like it will be fixed in the 8.1.x branch.  Evan had crashes with the patches to fix the connect timeout.  The connect timeout has been fixed in the 9.1.x branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford commented on issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
jrushford commented on issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710#issuecomment-822574607


   @smalenfant @ezelkow1 Is going to bring these two patches into 8.1 which we confirmed fixes this issue.
   
   9b47aa6799db12fd302ea58e3ae0ba8485fd0bbe: proxy.config.http.connect_attempts_timeout tracks TTBF instead of connect (Susan Hinrichs)
   9a8998e98f401d9e52202cfbd4b9d30fa2091e9c: HttpSM: add case check for VC_EVENT_INACTIVE_TIMEOUT, VC_EVENT_READ_READY, VC_EVENT_READ_COMPLETE (Brian Olsen)
   
   @smalenfant I think once @ezelkow1 brings them in we can close this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710#issuecomment-866359819


   After running with these many in the community have had crashes appear which reverting these changes appears to fix. So we will be backing them out from 8.1.x before release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] smalenfant commented on issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
smalenfant commented on issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710#issuecomment-1008889833


   Just compiled 8.1.3 and tried in my lab. The problem is present after patch been reversed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710#issuecomment-832123268


   @smalenfant I have merged in those fixes into 8.1.x, just checking if you are good now. If so can we close this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall closed issue #7710: connect_attempts_timeout not observed

Posted by GitBox <gi...@apache.org>.
bryancall closed issue #7710:
URL: https://github.com/apache/trafficserver/issues/7710


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org