You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/07/20 15:54:40 UTC

[GitHub] [trafficserver] jrushford opened a new pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

jrushford opened a new pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098


   Fixes and issue in ParentSelection and Nexthop strategies where a down parent may not be retried due to retriers limiting.  This PR when merged should be cherry picked to Trafficserver 9.1.x where the issue currently exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford closed pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
jrushford closed pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford commented on pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
jrushford commented on pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#issuecomment-889932021


   Closing this PR in favor of #8159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford commented on a change in pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
jrushford commented on a change in pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#discussion_r678536765



##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       With that said, I like how you've abstracted the retriers in the other PR.

##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       I think that was the idea with the else clause @ywkaras, to negate the increment on line 330 if they aren't retriers.  Am I missing something here?  

##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       That was the idea of the else clause, to back out the increment if the transaction on that core wasn't going to be a retrier.  Am I missing something @ywkaras?  I do like how you've abstracted this in the other PR though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ywkaras commented on a change in pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
ywkaras commented on a change in pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#discussion_r674339701



##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       For example suppose retriers == max_retriers - 1.  Then:
   1.  one core executes line 330, making retriers == max_retriers, so it goes to the else clause.
   2. another core executes line 330, making retriers == max_retriers + 1, so it also goes to the else clause.
   Both cores will then execute line 341, making retriers == max_retriers - 1 again.  But neither one acted as a retrier.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ywkaras commented on a change in pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
ywkaras commented on a change in pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#discussion_r678550893



##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       Is there some way you guarantee the thread interleaving scenario I describe above can not happen?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ywkaras commented on a change in pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
ywkaras commented on a change in pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#discussion_r673482790



##########
File path: proxy/http/remap/NextHopHealthStatus.cc
##########
@@ -149,3 +149,29 @@ NextHopHealthStatus::markNextHop(TSHttpTxn txn, const char *hostname, const int
     break;
   }
 }
+
+void
+NextHopHealthStatus::retryComplete(TSHttpTxn txn, const char *hostname, const int port)
+{
+  HttpSM *sm          = reinterpret_cast<HttpSM *>(txn);
+  ParentResult result = sm->t_state.parent_result;
+  int64_t sm_id       = sm->sm_id;
+
+  // make sure we're called back with a result structure for a parent that is being retried.
+  if (result.result != PARENT_SPECIFIED && !result.retry) {
+    return;
+  }
+
+  const std::string host_port = HostRecord::makeHostPort(hostname, port);
+  auto iter                   = host_map.find(host_port);
+  if (iter == host_map.end()) {
+    NH_Debug(NH_DEBUG_TAG, "[%" PRId64 "] no host named %s found in host_map", sm_id, host_port.c_str());
+    return;
+  }
+
+  std::shared_ptr h = iter->second;

Review comment:
       Nitpick:  I would make this `auto &retriers = iter->second->retriers;` to avoid creating a duplicate shared pointer and atomic increment/decrement of the reference count.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ywkaras commented on pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
ywkaras commented on pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#issuecomment-883756892


   I'm thinking we should make sure there are no race conditions when accessing the retriers count first:  https://github.com/apache/trafficserver/pull/8103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ywkaras commented on a change in pull request #8098: Fixes an issue in ParentSelection and NextHop strategies where a down parent may not retried

Posted by GitBox <gi...@apache.org>.
ywkaras commented on a change in pull request #8098:
URL: https://github.com/apache/trafficserver/pull/8098#discussion_r674339701



##########
File path: proxy/http/remap/NextHopConsistentHash.cc
##########
@@ -326,6 +326,7 @@ NextHopConsistentHash::findNextHop(TSHttpTxn txnp, void *ih, time_t now)
         // check if the host is retryable.  It's retryable if the retry window has elapsed
         _now == 0 ? _now = time(nullptr) : _now = now;
         if ((pRec->failedAt.load() + retry_time) < static_cast<unsigned>(_now)) {
+          NH_Debug(NH_DEBUG_TAG, "[%" PRIu64 "] next hop %s, retriers: %d", sm_id, pRec->hostname.c_str(), pRec->retriers.load());

Review comment:
       For example suppose retriers == max_retriers - 1.  Then:
   
   1.  one core executes line 330, making retriers == max_retriers, so it goes to the else clause.
   2. another core executes line 330, making retriers == max_retriers + 1, so it also goes to the else clause.
   
   Both cores will then execute line 341, making retriers == max_retriers - 1 again.  But neither one acted as a retrier.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org