You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2008/12/12 09:14:36 UTC

DO NOT REPLY [Bug 46384] New: Due to missing synchronization, a member may disappear permanent.

https://issues.apache.org/bugzilla/show_bug.cgi?id=46384

           Summary: Due to missing synchronization, a member may disappear
                    permanent.
           Product: Tomcat 5
           Version: 5.5.27
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Catalina:Cluster
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: mharm@airplus.com


Below there is a "pseudo-code-extract" of the McastServiceImpl Recieiver- and
Sender-Thread flow.

Now assume the following situation:
- ServerA,ServerB in a cluster; both had added each other to theire
McastMembership 

On ServerA:
t0: The "Sender"-Thread is at position [P0], and found the mm = "ServerB". 
    So in the moment "ServerB" is not in the McastMembership.map!!

t1: The "Receiver"-Thread receives a packet from "ServerB", 
    add this to the McastMembership, 
    calls the SimpleTcpCluster.memberAdded("ServerB") 
    and blocks on [P1]

t2: The "Sender"-Thread continues,
    calls SimpleTcpCluster.memberDisappeared("ServerB").


This leads to the following situation:
- the "ServerB" is in the McastMembership.map (and without timeouts, it wont
disappear)
- there is no Sessionreplication to "ServerB" 


That's it



Thread: Cluster-MembershipReceiver

McastServiceImpl.receive
   added= sync McastMembership.memberAlive(mm) { 
     if (mm not in map) then map+=mm;return true;
     else (mark mm as new); return false;
   }
   if (added) {
      SimpleTcpCluster.memberAdded(mm)
        log.info("Replication member added:" + member);
        sync ReplicationTransmitter.add(mm);
   }

  checkExpire
   ---[P1]---
   sync on McastServiceImpl(expiredMutex) {
      mm = sync McastMembership.expire() {
         if (mm in map to old) then map-=mm;
         return mm;
      }
      SimpleTcpCluster.memberDisappeared(mm);
        log.info("Received member disappeared:" + member);
        sync ReplicationTransmitter.remove(mm);
   }                      



Thread: Cluster-MembershipSender

  McastServiceImpl.send()

  checkExpire
   sync on McastServiceImpl(expiredMutex) {
      mm = sync McastMembership.expire() {
         if (mm in map to old) then map-=mm;
         return mm;
      }
      ---[P0]---
      SimpleTcpCluster.memberDisappeared(mm);
        log.info("Received member disappeared:" + member);
        sync ReplicationTransmitter.remove(mm);
   }


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384





--- Comment #4 from Mark Thomas <ma...@apache.org>  2009-07-01 05:31:22 PST ---
Patch withdrawn based on Filip's comment

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384





--- Comment #2 from Sebb <se...@apache.org>  2009-04-16 15:55:45 PST ---
Might be an idea to make the field "memebrshipMutex" (sic) final, as otherwise
the synchronisation is not guaranteed to work.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #23501|0                           |1
        is obsolete|                            |

--- Comment #5 from Mark Thomas <ma...@apache.org> 2009-09-13 19:14:42 BST ---
Created an attachment (id=24253)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24253)
Updated patch for this issue

I found the time to take another look at this.

Whilst Filip's comment about threads locking up is correct - and Tomcat 6 does
have a fix for that - threads locking up is not at the root of this issue. At
the root of this issue is there there are two lists of cluster members. One in
McastServiceImpl.membership and one in ReplicationTransmitter.map

Whilst checkExpire() does update both lists with the sync on expiredMutex, the
receiver thread updates the McastServiceImpl.membership outside of this mutex.
That leads to the problem that the OP is describing here.

Whilst Tomcat 6 does contain a fix for this, the code bases have diverged
sufficiently that the fix would be invasive. Therefore I am proposing a patch
for Tomcat that is similar to my earlier patch but has a slightly wider sync
block based on my better understanding of this issue.

I have tested the patch and whilst I can force this issue using a debugger
without the patch, I can not force it with the patch in place.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384





--- Comment #3 from Filip Hanik <fh...@apache.org>  2009-04-16 20:58:18 PST ---
(In reply to comment #1)
> Created an attachment (id=23501)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23501) [details]
> Patch to fix this issue
> 
> The attached patch should fix this although I haven't tested it.

I don't think that patch will fix it. The key problem here is that if the
sender thread gets locked up, it will stop broadcast the member itself, and
other nodes will deem it gone.

The only solution here is to not lock up the sender thread ever. The same goes
for the receiver thread. 

The code is a bit of a sync spaghetti mess, but Tomcat 6.0 has the fix for
this, that will prevent it from locking up these two threads.

TC 6 also has secondary verification mechanism, that are unrelated to this.

You'd be better off backporting the fix from Tomcat 6 to Tomcat 5

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #6 from Mark Thomas <ma...@apache.org> 2009-11-30 16:33:13 GMT ---
Fixed in trunk. Many thanks.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384

--- Comment #7 from Konstantin Kolinko <kn...@gmail.com> 2010-03-11 13:58:50 UTC ---
As said in http://marc.info/?l=tomcat-dev&m=125934902622453&w=2
it is fixed in 5.5.29.
(Probably that comment disappeared in the Bugzilla data loss/rollback incident)

The commit that fixed the issue is r884960.

As mentioned in comment 5 and comment 3, Tomcat 6 and trunk are not affected by
this issue.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384





--- Comment #1 from Mark Thomas <ma...@apache.org>  2009-04-16 13:49:55 PST ---
Created an attachment (id=23501)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23501)
Patch to fix this issue

The attached patch should fix this although I haven't tested it.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 46384] Due to missing synchronization, a member may disappear permanent.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46384

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #6 from Mark Thomas <ma...@apache.org> 2009-11-27 11:09:59 GMT ---
This has been fixed in 5.5.x and will be included in 5.5.29 onwards.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org