You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "John Plevyak (Created) (JIRA)" <ji...@apache.org> on 2012/03/20 18:11:46 UTC

[jira] [Created] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
----------------------------------------------------------------------------

                 Key: TS-1158
                 URL: https://issues.apache.org/jira/browse/TS-1158
             Project: Traffic Server
          Issue Type: Bug
          Components: Core
    Affects Versions: 3.0.3
         Environment: ALL
            Reporter: John Plevyak
            Assignee: John Plevyak
             Fix For: 3.1.4


Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "taorui (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234459#comment-13234459 ] 

taorui commented on TS-1158:
----------------------------

excellent, thanks again. 

On Wed, 2012-03-21 at 14:59 +0000, John Plevyak (Commented) (JIRA)




                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-1158:
------------------------------

    Backport to Version: 3.0.5  (was: 3.0.4)
    
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "John Plevyak (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak updated TS-1158:
-----------------------------

    Attachment: ts-1158-jp1.patch
    
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235725#comment-13235725 ] 

John Plevyak commented on TS-1158:
----------------------------------

I am not sure either, hence the new jira issue.
                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "weijin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234171#comment-13234171 ] 

weijin commented on TS-1158:
----------------------------

I see the read_from_net and write_to_net_io function also have such mechanism to prevent the race condition. I read and read it again, but still can not figure out how the mutex is switched. Can you explain it more detailly, and I also want to know what is consequences of the race.  thanks vvvvery much. 
                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by taorui <we...@126.com>.
I am afraid the race is (may be one of) the root cause of TS-857, but I
am not sure.

On Wed, 2012-03-21 at 14:53 +0000, John Plevyak (Commented) (JIRA)
wrote:
> [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234393#comment-13234393 ] 
> 
> John Plevyak commented on TS-1158:
> ----------------------------------
> 
> The mutex switch occurs in the HttpSessionManager.  When a session is passed to it, the read.vio.mutex and write.vio.mutex from the old controlling HttpSM are replaced with that of a hash bucket of sessions in the Manager (a hash to reduce contention on this globally shared data structure).  When a session is requested from the HttpSessionManager, they are replaced with those of the new HttpSM which will be using that OS connection.  During the swap, the previous and new mutexes are held, but nevertheless, a race is possible if a thread grabs the old (pre substitution) mutex, then a context switch occurs and the mutexes are swapped and the old mutex (pre substitute) lock is released, then the first thread resumes, locks the (pre substitution) mutex and now two threads are running while thinking they are holding the mutex for the NetVC.  The solution is to ensure, after the lock has been taken, that the mutex we have locked is the same one that is protecting the NetVC.  If it is not, we back out and retry later.
>                 
> > Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> > ----------------------------------------------------------------------------
> >
> >                 Key: TS-1158
> >                 URL: https://issues.apache.org/jira/browse/TS-1158
> >             Project: Traffic Server
> >          Issue Type: Bug
> >          Components: Core
> >    Affects Versions: 3.0.3
> >         Environment: ALL
> >            Reporter: John Plevyak
> >            Assignee: John Plevyak
> >             Fix For: 3.1.4
> >
> >         Attachments: ts-1158-jp1.patch
> >
> >
> > Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>        




[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234393#comment-13234393 ] 

John Plevyak commented on TS-1158:
----------------------------------

The mutex switch occurs in the HttpSessionManager.  When a session is passed to it, the read.vio.mutex and write.vio.mutex from the old controlling HttpSM are replaced with that of a hash bucket of sessions in the Manager (a hash to reduce contention on this globally shared data structure).  When a session is requested from the HttpSessionManager, they are replaced with those of the new HttpSM which will be using that OS connection.  During the swap, the previous and new mutexes are held, but nevertheless, a race is possible if a thread grabs the old (pre substitution) mutex, then a context switch occurs and the mutexes are swapped and the old mutex (pre substitute) lock is released, then the first thread resumes, locks the (pre substitution) mutex and now two threads are running while thinking they are holding the mutex for the NetVC.  The solution is to ensure, after the lock has been taken, that the mutex we have locked is the same one that is protecting the NetVC.  If it is not, we back out and retry later.
                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Re: [jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by taorui <we...@126.com>.
excellent, thanks again. 

On Wed, 2012-03-21 at 14:59 +0000, John Plevyak (Commented) (JIRA)
wrote:
> [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234396#comment-13234396 ] 
> 
> John Plevyak commented on TS-1158:
> ----------------------------------
> 
> Note that when replacing a mutex, both the new and old mutexes must be held.   Also note that this protection (double checking) is only provided in the NetProcessor as it is the only Processor whose VC mutexes are switched.  Any virtualization would need to provide the same protection.
>                 
> > Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> > ----------------------------------------------------------------------------
> >
> >                 Key: TS-1158
> >                 URL: https://issues.apache.org/jira/browse/TS-1158
> >             Project: Traffic Server
> >          Issue Type: Bug
> >          Components: Core
> >    Affects Versions: 3.0.3
> >         Environment: ALL
> >            Reporter: John Plevyak
> >            Assignee: John Plevyak
> >             Fix For: 3.1.4
> >
> >         Attachments: ts-1158-jp1.patch
> >
> >
> > Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>         




[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234396#comment-13234396 ] 

John Plevyak commented on TS-1158:
----------------------------------

Note that when replacing a mutex, both the new and old mutexes must be held.   Also note that this protection (double checking) is only provided in the NetProcessor as it is the only Processor whose VC mutexes are switched.  Any virtualization would need to provide the same protection.
                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-1158) Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent

Posted by "taorui (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235328#comment-13235328 ] 

taorui commented on TS-1158:
----------------------------

I am afraid the race is (may be one of) the root cause of TS-857, but I
am not sure.

On Wed, 2012-03-21 at 14:53 +0000, John Plevyak (Commented) (JIRA)




                
> Race on mutex switching for NetVConnections in UnixNetVConnection::mainEvent
> ----------------------------------------------------------------------------
>
>                 Key: TS-1158
>                 URL: https://issues.apache.org/jira/browse/TS-1158
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.3
>         Environment: ALL
>            Reporter: John Plevyak
>            Assignee: John Plevyak
>             Fix For: 3.1.4
>
>         Attachments: ts-1158-jp1.patch
>
>
> Because of the way session management works, the vio.mutex must be re-verified to be identical to the one the lock was taken on after the lock is acquired.  Otherwise there is a race when the mutex is switched allowing such that the old lock is held while the new lock is in not held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira