You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by GitBox <gi...@apache.org> on 2020/04/20 15:28:48 UTC

[GitHub] [trafficserver] shinrich opened a new pull request #6686: Fixes to hostDB to avoid event and memory leaks

shinrich opened a new pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686


   @randall found memory leaks in a previous hostDB fix I pushed 153c08166ed3ba83f92064e607b37210fa34c297.
   
   I spent quite a bit of time last winter tracking down HostDB events that would trigger after the original continuations had long died.  This was particularly bad in our forward proxy deployments where we didn't have control over the DNS servers.
   
   We got our internal 7 version stable in terms of eliminating the stale hostdb events and the major hostdb leak.  This PR is the functional changes between the two.  The event leak fixes involve better tracking the the scheduled events and making sure to cancel them before cleaning things up.
   
   Our internal version has been running with the asserts ensuring that the action->mutex == action.continuation->mutex.  So in this PR, I left the asserts but got rid of the odd second lock logic.  After staring at enough cores, I am convinced that those occurred in cases where the continuation had been deleted and reassigned. 
   
   I also removed the hostdb_insert_timeout.  That appeared to be remnants of obsolete logic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vmamidi commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
vmamidi commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616753827


   If we are trying to fix #5952, I wonder why didn't HttpSM Continuation cancel pending event when it died? If we think HttpSM canceling the pending DNS event is working as expected, why didn't DNS consider that?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
randall commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616780585


   @zwoop Can you put this PR on docs and see if the leaks you were seeing go away? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616874375


   @randall can you tell what event was delivered to the HttpTunnel that caused the assert?  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
randall commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616780380


   Running under Instruments.app, I no longer see the HostDB continuation leak. I do see an assert hit here:
   
   ```
   Fatal: HttpSM.cc:2880: failed assertion `event == HTTP_TUNNEL_EVENT_DONE`
   2020-04-20 11:14:18.818074-0700 traffic_server[92646:8797592] Fatal: HttpSM.cc:2880: failed assertion `event == HTTP_TUNNEL_EVENT_DONE`
   traffic_server: received signal 6 (Abort trap: 6)
   traffic_server - STACK TRACE:
   0   traffic_server                      0x000000010c25d468 crash_logger_invoke(int, __siginfo*, void*) + 136
   1   libsystem_platform.dylib            0x00007fff6e91e5fd _sigtramp + 29
   2   ???                                 0x0000000000000000 0x0 + 0
   3   libsystem_c.dylib                   0x00007fff6e7f4808 abort + 120
   4   libtscore.9.dylib                   0x000000010d63c81e ink_abort(char const*, ...) + 350
   5   libtscore.9.dylib                   0x000000010d63990c _ink_assert + 44
   6   traffic_server                      0x000000010c346e36 HttpSM::tunnel_handler(int, void*) + 438
   7   traffic_server                      0x000000010c33ce4a HttpSM::main_handler(int, void*) + 1034
   8   traffic_server                      0x000000010c26091f Continuation::handleEvent(int, void*) + 255
   9   traffic_server                      0x000000010c6fd3fa read_signal_and_update(int, UnixNetVConnection*) + 90
   10  traffic_server                      0x000000010c6fd33d read_signal_done(int, NetHandler*, UnixNetVConnection*) + 45
   11  traffic_server                      0x000000010c6fd303 UnixNetVConnection::readSignalDone(int, NetHandler*) + 35
   12  traffic_server                      0x000000010c6a61b9 SSLNetVConnection::net_read_io(NetHandler*, EThread*) + 3993
   13  traffic_server                      0x000000010c6ebc66 NetHandler::process_ready_list() + 182
   14  traffic_server                      0x000000010c6ec6b5 NetHandler::waitForActivity(long long) + 2181
   15  traffic_server                      0x000000010c735ab2 EThread::execute_regular() + 1058
   16  traffic_server                      0x000000010c735f7d EThread::execute() + 285
   17  traffic_server                      0x000000010c734047 spawn_thread_internal(void*) + 103
   18  libsystem_pthread.dylib             0x00007fff6e92a109 _pthread_start + 148
   19  libsystem_pthread.dylib             0x00007fff6e925b8b thread_start + 15%             
   ```
   
   Unfortunately, I cannot get this to occur under lldb directly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] vmamidi commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
vmamidi commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-617252588


   > @vmamidi The problem isn't that HttpSM died. Rather that multiple events were created for the HostDBContinutation, and the original logic was not correctly tracking those events. So when the HttpSM finished, there were lingering retry events that hadn't been canceled. Some logic needed to track those events so they could be canceled when needed.
   
   if HttpSM finished, and the events weren't canceled? shouldn't HttpSM cancel those events instead of HostDBContinuation trying to clear them? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-617441629


   The events are against the HostDBContinuations.  The stale event cases were against the deleted and possibly reallocated HostDBContiunations.  So if the HostDBContinuation can keep track of all outstanding events against itself, then it can cancel those events before it is deleted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616673451


   Updated to address @randall's comments


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-631696727


   Cherry-picked to v9.0.x branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zizhong commented on a change in pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
zizhong commented on a change in pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#discussion_r476276616



##########
File path: iocore/hostdb/HostDB.cc
##########
@@ -1175,12 +1174,13 @@ HostDBContinuation::dnsEvent(int event, HostEnt *e)
     timeout = nullptr;
   }
   EThread *thread = mutex->thread_holding;
-  if (event == EVENT_INTERVAL) {
+  if (event != DNS_EVENT_LOOKUP) {
+    // This was an event_interval or an event_immediate
+    // Either we timed out, or remove_trigger_pending gave up on us
     if (!action.continuation) {
       // give up on insert, it has been too long
-      // remove_trigger_pending_dns will notify and clean up all requests
-      // including this one.
-      remove_trigger_pending_dns();
+      hostDB.pending_dns_for_hash(hash.hash).remove(this);
+      hostdb_cont_free(this);

Review comment:
       hmm, should we hold the mutex before free this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich merged pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich merged pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#discussion_r476443773



##########
File path: iocore/hostdb/HostDB.cc
##########
@@ -1175,12 +1174,13 @@ HostDBContinuation::dnsEvent(int event, HostEnt *e)
     timeout = nullptr;
   }
   EThread *thread = mutex->thread_holding;
-  if (event == EVENT_INTERVAL) {
+  if (event != DNS_EVENT_LOOKUP) {
+    // This was an event_interval or an event_immediate
+    // Either we timed out, or remove_trigger_pending gave up on us
     if (!action.continuation) {
       // give up on insert, it has been too long
-      // remove_trigger_pending_dns will notify and clean up all requests
-      // including this one.
-      remove_trigger_pending_dns();
+      hostDB.pending_dns_for_hash(hash.hash).remove(this);
+      hostdb_cont_free(this);

Review comment:
       If it was being called through the event processing logic, the mutex should already be held.  Let me take a look through on how it is being called directly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#discussion_r476447457



##########
File path: iocore/hostdb/HostDB.cc
##########
@@ -1175,12 +1174,13 @@ HostDBContinuation::dnsEvent(int event, HostEnt *e)
     timeout = nullptr;
   }
   EThread *thread = mutex->thread_holding;
-  if (event == EVENT_INTERVAL) {
+  if (event != DNS_EVENT_LOOKUP) {
+    // This was an event_interval or an event_immediate
+    // Either we timed out, or remove_trigger_pending gave up on us
     if (!action.continuation) {
       // give up on insert, it has been too long
-      // remove_trigger_pending_dns will notify and clean up all requests
-      // including this one.
-      remove_trigger_pending_dns();
+      hostDB.pending_dns_for_hash(hash.hash).remove(this);
+      hostdb_cont_free(this);

Review comment:
       Looks like when we call handleEvent we may check for affinity threads but we don't grab the continuation lock. May need to do that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-616874877


   @vmamidi The problem isn't that HttpSM died.  Rather that multiple events were created for the HostDBContinutation, and the original logic was not correctly tracking those events.  So when the HttpSM finished, there were lingering retry events that hadn't been canceled.  Some logic needed to track those events so they could be canceled when needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop edited a comment on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
zwoop edited a comment on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-617362114


   Running on Docs now (hence the OnDocs label :-).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-622418235


   @randall EOS is very odd.  The tunnel endpoint handlers should have processed the EOS.  There must have been a path that shut down the tunnel but didn't change the HttpSM default handler.  I should spent more time running with debug enabled to see these extra asserts.
   
   @zwoop how has this been running on the docs machine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-622418235


   @randall EOS is very odd.  The tunnel endpoint handlers should have processed the EOS.  There must have been a path that shut down the tunnel but didn't change the HttpSM default handler.  I should spent more time running with debug enabled to see these extra asserts.
   
   @zwoop how has this been running on the docs machine?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on a change in pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
randall commented on a change in pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#discussion_r411484138



##########
File path: iocore/hostdb/HostDB.cc
##########
@@ -1420,37 +1418,38 @@ HostDBContinuation::dnsEvent(int event, HostEnt *e)
         return EVENT_CONT;
       }
 
-      // We have seen cases were the action.mutex != action.continuation.mutex.
+      // We have seen cases were the action.mutex != action.continuation.mutex.  However, it seems that case
+      // is likely a memory corruption... Thus the introduction of the asssert.

Review comment:
       assert




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
zwoop commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-617362114


   Running on Docs now (hence the OnDocs labe :-).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on issue #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
randall commented on issue #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-617243024


   @shinrich It's 104 (`VC_EVENT_EOS` ?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #6686: Fixes to hostDB to avoid event and memory leaks

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #6686:
URL: https://github.com/apache/trafficserver/pull/6686#issuecomment-631629579


   I had pushed some additional commits to this PR to track down the assert that @randall was seeing.  We determined that this assert was not dependent on the hostdb changes, so I moved those fixes to PR #6809.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org