You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2020/11/20 02:05:01 UTC

[GitHub] [trafficserver] shinrich opened a new pull request #7337: Fix vc close migration race condition

shinrich opened a new pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337


   The issue contains the details.  This problem came in with PR #7278.  Once we added that commit, our canary production box would crash at least once an hour with corrupted memory stacks (examples in issue).  We ran an ASAN version of the build in our prod sim and came up with a use-after-free and a very similar double free failure (again stacks in the issue).
   
   The use after free issue was do to the close flag being set early in UnixNetVConnection::do_io_close, but having accesses to the object later.  The other thread would see the close flag set when walking the event look and delete the netvc object from underneath the first thread.
   
   We have been running a build with this patch on a production box for over 3 hours now.  We will expand our testing of this, but it seems likely that this is addressing the crash we were seeing.
   
   This closes issue #7333 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] sudheerv commented on a change in pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
sudheerv commented on a change in pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337#discussion_r527799494



##########
File path: iocore/net/UnixNetVConnection.cc
##########
@@ -686,6 +680,14 @@ UnixNetVConnection::do_io_close(int alerrno /* = -1 */)
     this->lerrno = alerrno;
   }
 
+  // Must mark for closed last in case this is a
+  // cross thread migration scenario.
+  if (alerrno == -1) {

Review comment:
       Does setting `closed` here really prevent the use-after-free? Presumably, this thread is still trying to clean up the VC, which means it's still running the ::free/::clear routines. Wouldn't the race still exist for the read ready loop then? I wonder you will still see the issue from ASAN even with this new change.
   
   I think we do need to delay setting the read.vio.cont and write.vio.cont to nullptr in do_io_close to after setting close  flag, as otherwise, the net_read_io may get unblocked on the read vio mutex and run a previously rescheduled read event




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich edited a comment on pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
shinrich edited a comment on pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337#issuecomment-731486035


   Added a commit to defer clearing the the vio continations as @sudheerv suggested.  I updated our test build with this and it is  also running well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich merged pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
shinrich merged pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337#issuecomment-738244377


   Cherry-picked to v9.0.x branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337#issuecomment-731486035


   Added a commit to defer clearing the the vio continations as @sudheerv suggested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #7337: Fix vc close migration race condition

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #7337:
URL: https://github.com/apache/trafficserver/pull/7337#discussion_r527806302



##########
File path: iocore/net/UnixNetVConnection.cc
##########
@@ -686,6 +680,14 @@ UnixNetVConnection::do_io_close(int alerrno /* = -1 */)
     this->lerrno = alerrno;
   }
 
+  // Must mark for closed last in case this is a
+  // cross thread migration scenario.
+  if (alerrno == -1) {

Review comment:
       I can try moving clearing the vio contintations to the free.  But even reducing the race window this tiny amount seems to have eliminated the crash for us.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org