You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by bu...@apache.org on 2012/07/05 23:56:22 UTC

[Bug 53513] New: Race condition in session replication at node startup

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

          Priority: P2
            Bug ID: 53513
          Assignee: dev@tomcat.apache.org
           Summary: Race condition in session replication at node startup
          Severity: major
    Classification: Unclassified
                OS: Linux
          Reporter: djohle@industrialinfo.com
          Hardware: PC
            Status: NEW
           Version: 7.0.26
         Component: Cluster
           Product: Tomcat 7

My configuration:

2 nodes running Tomcat 7.0.26
Using a custom session manager, which extends the DeltaManager

My startInternal() method first calls super.startInternal(), then performs a
few additional initializations.

I reviewed the code of DeltaManager.startInternal(), and it calls
getAllClusterSessions() which in turn calls waitForSendAllSessions(), which
requires either getStateTransfered() to return true, or a timeout.

So by this, I should be able to trust that as the second node starts, the
initial sync up of all session data from the first node has completed prior to
the startInternal() method exiting (and thus prior to my initializations).

This is, however, not the case!  I can confirm this by repeatedly logging the
value of findSessions().length during my inializations, and see that number
going up!

There appears to be a race condition between the processing of the message
containing the actual session data & the "transfer complete" message.  After
tracing this through a little further, I see the stateTransfered is set to true
in the handleALL_SESSION_TRANSFERCOMPLETE() callback method.  And that callback
is being called PRIOR to the session data itself even being received!

Here is the debug logging output (slightly scrubbed) which shows this out of
order messaging:

Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager
getAllClusterSessions
INFO: Manager [wwwtest#], requesting session state from
org.apache.catalina.tribes.membership.MemberImpl[...].
This operation will timeout if no session state has been received within 60
seconds.

Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager
messageReceived
FINE: Manager [wwwtest#]: Received SessionMessage of
type=(SESSION-STATE-TRANSFERED) from
[org.apache.catalina.tribes.membership.MemberImpl[...]

Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager
handleALL_SESSION_TRANSFERCOMPLETE
FINE: Manager [wwwtest#] received from node [[B@6789b939:4,000] session state
transfered.

Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager
messageReceived
FINE: Manager [wwwtest#]: Received SessionMessage of type=(ALL-SESSION-DATA)
from [org.apache.catalina.tribes.membership.MemberImpl[...]

Jul 5, 2012 4:20:41 PM org.apache.catalina.ha.session.DeltaManager
handleALL_SESSION_DATA
FINE: Manager [wwwtest#]: received session state data

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

--- Comment #2 from Keiichi Fujino <kf...@apache.org> ---
Thanks for the report.

I think there is a problem with the behavior of the DeltaManager.

As you know, DeltaManager is responsible for synchronizing the session on
startup.
A node receiving the EVT_GET_ALL_SESSIONS message is to serialize all session, 
and then sends back a EVT_ALL_SESSION_DATA message.
After completing  EVT_ALL_SESSION_DATA message, sends a
EVT_ALL_SESSION_TRANSFERCOMPLETE message.

At this time, if channelSendOptions is asynchronous(default),
EVT_ALL_SESSION_DATA message is sent asynchronously.

As a result, will be a race condition between the processing of the message
containing 
the actual session data and the "transfer complete" message.

I'm going to fix this behavior.
I intend to make EVT_ALL_SESSION_DATA message always send in synchronous mode.

Anyway the current workaround is to set 6 (sync + ack) to channelSendOptions.

Best Regards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

--- Comment #3 from Keiichi Fujino <kf...@apache.org> ---
Fixed in trunk and 7.0.x and will be included in 7.0.30 onwards.
Proposed for 6.0.x.

Note:
In this fix, EVT_ALL_SESSION_DATA message is sent in synchronous mode. 
Therefore, it waits for completion of a the message till Sender#timeout
(default 3000 milliseconds). 
When timeout occurs while sending the EVT_ALL_SESSION_DATA message, 
you can configure following attributes. 
Sender#timeout
DeltaManager#sendAllSessions DeltaManager#sendAllSessionsSize 
DeltaManager#sendAllSessionsWaitTime

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

gbalogh@industrialinfo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gbalogh@industrialinfo.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

--- Comment #1 from David <dj...@industrialinfo.com> ---
In case it's helpful, here's the Cluster configuration...fairly basic stuff:


        <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
          <Manager  className="my.deltamanager.extension.CustomManager"
            expireSessionsOnShutdown="false"
            notifyListenersOnReplication="true"/>

          <Channel className="org.apache.catalina.tribes.group.GroupChannel">

            <Membership
className="org.apache.catalina.tribes.membership.McastService"
                        address="239.1.1.1"
                        port="45564"
                        frequency="500"
                        dropTime="3000"/>
            <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                      address="auto"
                      port="4000"
                      autoBind="100"
                      selectorTimeout="5000"
                      maxThreads="6"/>

            <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
              <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
            </Sender>

            <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>

            <Interceptor
className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>

          </Channel>

          <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
                 filter="\*\.page"/>
          <Valve
className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>

          <ClusterListener
className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
          <ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>

        </Cluster>

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|Cluster                     |Cluster
            Version|7.0.26                      |6.0.35
            Product|Tomcat 7                    |Tomcat 6
   Target Milestone|---                         |default

--- Comment #4 from Mark Thomas <ma...@apache.org> ---
Moving to Tomcat 6 since it has been fixed in 7.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Mark Thomas <ma...@apache.org> ---
Fixed in 6.0.x and will be included in 6.0.36 onwards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

[Bug 53513] Race condition / out of order operation in session replication at node startup

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53513

David <dj...@industrialinfo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Race condition in session   |Race condition / out of
                   |replication at node startup |order operation in session
                   |                            |replication at node startup

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org