You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/05/15 23:05:49 UTC

[GitHub] [trafficserver] SolidWallOfCode opened a new pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

SolidWallOfCode opened a new pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843


   In situations where DNS is lightly used, or when there is only one FQDN requested, if a nameserver fails there is no failover to another nameserver.
   
   The underlying cause is recovery depends on actions triggered from `DNSHandler::mainEvent`. This means if a nameserver fails on a request, it takes another request for a different FQDN to trigger recovery. In a sidecar situation where only one FQDN is every requested, new requests for that FQDN stack up on the collapsing queue for the failed nameserver and ATS appears to become unresponsive. This fix makes sure that if there is a nameserver failure, `DNSHandler::mainEvent` is invoked at least once after the failure. If that happens, recovery becomes self sustaining recovery actions cause subsequent invocations of `DNSHandler::mainEvent`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032696510


   [approve ci centos]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-852196572


   I think I've tracked down all the required changes. Fundamentally the issue was
   
   1. In the non-RR case, the failed nameserver was not marked as down, and so HostDB requests for the same FQDN would pile up without triggering nameserver failover.
   1. The failover checks would indicate no need for failover even if the the namserver was marked down.
   
   As minor other cleanups
   
   *  The incorrect use of `schedule_at` was replaced by `schedule_in`.
   *  The down flag is cleared after successfully adding the nameserver connection to `epoll`, not before.
   *  Rate limiting for rescheduling `mainEvent` for retries was added, otherwise the number of scheduled events grew gradually but without apparent bound.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
bryancall commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-868704938


   @SolidWallOfCode is going to pull this into our internal tree on the 9.1.x branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
randall commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-966451670


   [approve ci]
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032661069


   [approve ci centos]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] maskit commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
maskit commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r633166216



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       Shouldn't it be `schedule_in`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1042442403


   This was reverted in 92d238a8fad483c58bc787f784b2ceae73aed532


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-962502476


   [approve ci]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-842667247


   Something is still not quite right. Further testing reveals some oddities I need to track down. At this point I think the problem is that in the case where the nameservers are not round robin, a down nameserver is never actually marked down.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] maskit commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
maskit commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r633166216



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       Shouldn't it be `schedule_in`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
zwoop commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r635307932



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       Yeh, that looks wrong, examining the code, _at vs _in is clear. If you want to use _at, it would need to be
   
   ```
   this_ethread()->schedule_at(this, Thread::get_hrtime() + DNS_PRIMARY_RETRY_PERIOD);
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032137643






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
bryancall commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032104150


   [approve ci clang-format]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-918654572


   Looks some one else might be seeing this - Nir Finkel on the chat. He's currently going to do some tests to hopefully verify this is the problem and then maybe try out this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
bryancall removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032104150


   [approve ci clang-format]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032109100


   [approve ci]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r633685275



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       Interesting question. The code is cloned from [here](https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1021) in `DNSHandler::mainEvent`. Maybe that's broken too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032137643






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032661069


   [approve ci centos]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1042437853


   Cherry-picked to v9.2.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-964297737






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-964298153


   [approve ci fedora]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r643184758



##########
File path: iocore/dns/DNS.cc
##########
@@ -524,12 +524,12 @@ DNSHandler::open_con(sockaddr const *target, bool failed, int icon, bool over_tc
     }
     return false;
   } else {
-    ns_down[icon] = 0;
     if (cur_con.eio.start(pd, &cur_con, EVENTIO_READ) < 0) {
       Error("[iocore_dns] open_con: Failed to add %d server to epoll list\n", icon);
     } else {
-      cur_con.num = icon;
-      Debug("dns", "opening connection %s SUCCEEDED for %d", ip_text, icon);
+      cur_con.num   = icon;
+      ns_down[icon] = 0;

Review comment:
       Note - don't clear the down flag if this nameserver connection isn't in the `epoll`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r641580152



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       OK, I'll fix both. Still struggling with other weird things going on.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#discussion_r633685275



##########
File path: iocore/dns/DNS.cc
##########
@@ -766,6 +766,8 @@ DNSHandler::failover()
     ip_text_buffer buff;
     Warning("failover: connection to DNS server %s lost, retrying", ats_ip_ntop(&ip.sa, buff, sizeof(buff)));
   }
+  // Make sure retries are done even if no more requests.
+  this_ethread()->schedule_at(this, DNS_PRIMARY_RETRY_PERIOD);

Review comment:
       Interesting question. The code is cloned from [here](https://github.com/apache/trafficserver/blob/master/iocore/dns/DNS.cc#L1021) in `DNSHandler::mainEvent`. Maybe that's broken too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-962502476


   [approve ci]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
bryancall commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-868704570


   @randall is going to take a look at this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-842690977


   Also, timeouts don't seem to be working. The DNS layer never times out the request and the HostDB layer just keeps stacking up the queue for that host. AFAICT it's the generic inactivity timeout that terminates the connection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode merged pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode merged pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode removed a comment on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode removed a comment on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-1032696510


   [approve ci centos]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on pull request #7843: DNS: Fix lack of nameserver failover in low use circumstances.

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on pull request #7843:
URL: https://github.com/apache/trafficserver/pull/7843#issuecomment-964297737


   [approve ci centos]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org