You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by Mikel Ibiricu <jl...@gmail.com> on 2009/03/09 09:24:54 UTC

Re: Tomcat Clustering trouble when starting up under high load

Hi Filip

Thanks for your response. We have been testing some modifications on our
config, specially focusing in what you told us about limiting
stateTransferTimeout, which we have limited to 180 seconds now. Actually, it
does not get stuck, in the worst case, it only starts without replicating
anything

I have been trying some fine modifications, specially in the sender and
receiver config. Our actual config is like that:

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
       <Manager className="org.apache.catalina.ha.session.DeltaManager"
                   name="clusterPruebas6"
                   stateTransferTimeout="180"
                   expireSessionsOnShutdown="false"
                   notifyListenersOnReplication="false"/>

       <Channel className="org.apache.catalina.tribes.group.GroupChannel">
          <Membership
className="org.apache.catalina.tribes.membership.McastService"
                        address="228.0.0.9"
                                                bind="172.26.102.233"
                        port="45569"
                        frequency="500"
                        dropTime="15000"
                        soTimeout="10000"/>
            <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                      address="172.26.102.233"
                      port="4009"
                      autoBind="100"
                      selectorTimeout="5000"
                      maxThreads="25"
                      timeout="3000"/>

            <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                  <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
                                          keepAliveCount="50"
                                          soLingerOn="false"
                                          direct="false"
                                          poolSize="25"
                                          soReuseAddress="true"
                                          ooBInline="true"
                                          maxRetryAttempts="0"
                                          throwOnFailedAck="false"/>
             </Sender>
             <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
             <!--Interceptor
className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
                    <Member
className="org.apache.catalina.tribes.membership.StaticMember"
                                   port="45569"
                                   securePort="-1"
                                   host="172.26.102.233"
                                   domain="cluster"
                                   uniqueId="{0,1,2,3,4,5,6,7,8,10}"/>
              </Interceptor-->
              <Interceptor
className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
              <Interceptor
className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>

              <Valve
className="org.apache.catalina.cluster.tcp.ReplicationValve"

filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>

              <ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>

       </Channel>

</Cluster>

We tried static membsership, without apparent performance changes over the
mcast membership.

So, It works OK but when starting up one of the nodes with over 500 sessions
alive in the other, it doesn't replicate anything. We assume that it would
not be able to replicate everything... but why it does either replicate
everything or nothing? If it's not able replicate all the sessions, we would
assume it, but we would like to replicate at least something... Maybe some
trouble with the config we are trying? We tried to limit the keepAliveCount
of the senders, without improvement with this.

Reading the DeltaManager code, I have seen the sendAlllSessions parameter.
According to in-line javadoc and the implementation,

/**
     * handle receive that other node want all sessions ( restart )
     * a) send all sessions with one message
     * b) send session at blocks
     * After sending send state is complete transfered
     * @param msg
     * @param sender
     * @throws IOException
     */
    protected void handleGET_ALL_SESSIONS(SessionMessage msg, Member sender)
throws IOException {

        [...]

        if (isSendAllSessions()) {
            sendSessions(sender, currentSessions, findSessionTimestamp);
        } else {
            // send session at blocks
            int len = currentSessions.length < getSendAllSessionsSize() ?
currentSessions.length : getSendAllSessionsSize();
            Session[] sendSessions = new Session[len];
            for (int i = 0; i < currentSessions.length; i +=
getSendAllSessionsSize()) {
                len = i + getSendAllSessionsSize() > currentSessions.length
? currentSessions.length - i : getSendAllSessionsSize();
                System.arraycopy(currentSessions, i, sendSessions, 0, len);
                sendSessions(sender, sendSessions,findSessionTimestamp);
                if (getSendAllSessionsWaitTime() > 0) {
                    try {
                        Thread.sleep(getSendAllSessionsWaitTime());
                    } catch (Exception sleep) {
                    }
                }//end if
            }//for
        }//end if

        SessionMessage newmsg = new
SessionMessageImpl(name,SessionMessage.EVT_ALL_SESSION_TRANSFERCOMPLETE,
null,"SESSION-STATE-TRANSFERED", "SESSION-STATE-TRANSFERED"+ getName());
        newmsg.setTimestamp(findSessionTimestamp);
        if (log.isDebugEnabled())
log.debug(sm.getString("deltaManager.createMessage.allSessionTransfered",getName()));
        counterSend_EVT_ALL_SESSION_TRANSFERCOMPLETE++;
        cluster.send(newmsg, sender);
    }

it may cover our expectatives, so I tried to include sendAllSessions="false"
in the cluster manager configuration

<Manager className="org.apache.catalina.ha.session.DeltaManager"
                   name="clusterPruebas6"
                   stateTransferTimeout="180"
                   expireSessionsOnShutdown="false"
                   notifyListenersOnReplication="false"
                   sendAllSessions="false"/>

But it seems like this parameter is not configurable from the server.xml
file. So, what can I do to force that if it's not posible to replicate all
the sessions, at least replicate something in the startup?

Thanks a lot
Mikel

2009/1/31 Filip Hanik - Dev Lists <de...@hanik.com>

> state transfer timeout -1 is not a good setting. one should prefer to
> timeout rather than getting stuck, even if the timeout means we didn't get
> everything
>
> you may also try the backup manager, which does the state transfer in a bit
> smarter manner.
>
> when your system is stuck, then thread dumps are crucial to resolving your
> actual issue
>
> Filip
>
>
> Mikel Ibiricu wrote:
>
>> Hello all
>>
>> I´m Mikel, and me and my workmates have already been a while testing our
>> environment in order to  establish a in memory session replication cluster
>> for our servers. The thing is that our servers are often loaded with up to
>> a
>> thousand (1000) and more sessions (and now, we have three tomcat nodes in
>> different machines working in a load balancer!!!!). In hot moments, I have
>> counted up to 4500 sessions distributed between the three servers.
>>
>> So, I'll tell you the main configuration our production environment. Web
>> servers:
>>
>> Two windows server 2003 with IIS and isapi_redirect.dll conector.
>>
>> App servers
>>
>> node 1 & 2: IBM Xseries_3550 Intel Xeon CPU 5150 @2,66GHz, 2,00 GB RAM,
>> Windows 2003 Server R2
>> node3: IBM XSeries_366 Intel Xeon CPU 3,20Ghz, 3,00 GB RAM, Windows Server
>> 2003 R2
>>
>> In our development environment, where we have been making our tests, we
>> have
>> 2 tomcats in two different machines
>>
>> node 1: Intel Xeon CPU E5440 @ 2,83GHz, 1,00 GB RAM, Windows Server 2003
>> R2
>> node 2: IBM XSeries_3550 Intel Xeon CPU E5440 @ 2,83 GHz, 2GB RAM, Windows
>> Server 2003 R2
>>
>> We have been testing the Tomcat 5.5.9 that we are using in production,
>> finding some trouble, even after aplying the clustering fix pack from
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=34389 . Finally, we
>> took
>> the decission to upgrade to last Tomcat 6 available, version 6.0.18, to
>> see
>> if the announced refactoring of cluster subsystem could solve our trouble.
>>
>> After all this prety long intro, I'll tell you the reason of my request.
>> When we test the cluster (in development environment) with both nodes
>> running, we create up to a thousand sessions, keeping alive and
>> modificating
>> about 500. So, we can see that all of them get replicated to the other
>> node
>>  quickly . The trouble comes when, after shutting down one of the
>> instances,
>> we start it again (while the half of the alive sessions are still beeing
>>  modified by the JMeter test). In our tests, the starting tomcat instance
>> finally gets hunged when receiving sessions from the alive node.
>>
>> Theese are the traces seen in the catalina.log of node 1 when starting it:
>>
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.core.AprLifecycleListener init
>> INFO: The APR based Apache Tomcat Native library which allows optimal
>> performance in production environments was not found on the
>> java.library.path:
>> C:\tomcat-6.0.18\bin;.;C:\WINDOWS\system32;C:\WINDOWS;C:\Program
>> Files\Serena\Dimensions
>>
>> 10.1\CM\prog;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\system32\WBEM;C:\Program
>> Files\IBM\Director\bin;C:\Program Files\Common
>> Files\IBM\ICC\cimom\bin;C:\Program Files\System Center Operations Manager
>> 2007\
>> Jan 26, 2009 6:51:56 PM org.apache.coyote.http11.Http11Protocol init
>> INFO: Initializing Coyote HTTP/1.1 on http-9080
>> Jan 26, 2009 6:51:56 PM org.apache.coyote.http11.Http11Protocol init
>> INFO: Initializing Coyote HTTP/1.1 on http-9081
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.startup.Catalina load
>> INFO: Initialization processed in 1928 ms
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.core.StandardService start
>> INFO: Starting service Catalina
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.core.StandardEngine start
>> INFO: Starting Servlet Engine: Apache Tomcat/6.0.18
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.ha.tcp.SimpleTcpCluster start
>> INFO: Cluster is about to start
>> Jan 26, 2009 6:51:56 PM org.apache.catalina.tribes.transport.ReceiverBase
>> bind
>> INFO: Receiver Server Socket bound to:/172.26.102.233:4009
>> Jan 26, 2009 6:51:56 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
>> INFO: Attempting to bind the multicast socket to /228.0.0.9:45569
>> Jan 26, 2009 6:51:56 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
>> INFO: Binding to multicast address, failed. Binding to port only.
>> Jan 26, 2009 6:51:56 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
>> INFO: Setting multihome multicast interface to:/172.26.102.233
>> Jan 26, 2009 6:51:56 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
>> INFO: Setting cluster mcast soTimeout to 1000
>> Jan 26, 2009 6:51:56 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
>> INFO: Sleeping for 2000 milliseconds to establish cluster membership,
>> start
>> level:4
>> Jan 26, 2009 6:51:57 PM org.apache.catalina.ha.tcp.SimpleTcpCluster
>> memberAdded
>> INFO: Replication member
>> added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 26,
>> 102,
>> -60}:4009,{-84, 26, 102, -60},4009, alive=1938953,id={-57 67 34 -23 -38 83
>> 74 68 -67 -87 -112 -94 13 102 -78 -20 }, payload={}, command={},
>> domain={},
>> ]
>> Jan 26, 2009 6:51:58 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
>> INFO: Done sleeping, membership established, start level:4
>> Jan 26, 2009 6:51:58 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
>> INFO: Sleeping for 2000 milliseconds to establish cluster membership,
>> start
>> level:8
>> Jan 26, 2009 6:51:58 PM org.apache.catalina.tribes.io.BufferPool
>> getBufferPool
>> INFO: Created a buffer pool with max size:104857600 bytes of
>> type:org.apache.catalina.tribes.io.BufferPool15Impl
>> Jan 26, 2009 6:52:00 PM
>> org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
>> INFO: Done sleeping, membership established, start level:8
>> Jan 26, 2009 6:52:14 PM org.apache.catalina.ha.session.DeltaManager start
>> INFO: Register manager  to cluster element Engine with name Catalina
>> Jan 26, 2009 6:52:14 PM org.apache.catalina.ha.session.DeltaManager start
>> INFO: Starting clustering manager at
>> Jan 26, 2009 6:52:14 PM org.apache.catalina.ha.session.DeltaManager
>> getAllClusterSessions
>> WARNING: Manager [localhost#], requesting session state from
>> org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 26, 102,
>> -60}:4009,{-84, 26, 102, -60},4009, alive=1955953,id={-57 67 34 -23 -38 83
>> 74 68 -67 -87 -112 -94 13 102 -78 -20 }, payload={}, command={},
>> domain={},
>> ]. This operation will timeout if no session state has been received
>> within
>> -1 seconds.
>> Jan 26, 2009 7:00:50 PM org.apache.catalina.startup.Catalina stopServer
>> SEVERE: Catalina.stop:
>> java.net.ConnectException: Connection refused: connect
>>    at java.net.PlainSocketImpl.socketConnect(Native Method)
>>    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
>>    at java.net.Socket.connect(Socket.java:507)
>>    at java.net.Socket.connect(Socket.java:457)
>>    at java.net.Socket.<init>(Socket.java:365)
>>    at java.net.Socket.<init>(Socket.java:178)
>>    at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:421)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>    at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>    at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>    at java.lang.reflect.Method.invoke(Method.java:585)
>>    at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:337)
>>    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:415)
>>
>> The last trace was because after nearly ten minutes waiting, we resolved
>> to
>> shut the tomcat instance down. In the running Tomcat (node 2):
>>
>> 26-ene-2009 18:51:58 org.apache.catalina.ha.tcp.SimpleTcpCluster
>> memberAdded
>> INFO: Replication member
>> added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84, 26,
>> 102,
>> -23}:4009,{-84, 26, 102, -23},4009, alive=2047,id={-20 60 112 -113 35 -41
>> 71
>> -5 -124 47 93 -37 117 -9 -9 29 }, payload={}, command={}, domain={}, ]
>> 26-ene-2009 19:00:51
>> org.apache.catalina.tribes.transport.nio.NioReplicationTask run
>> ADVERTENCIA: IOException in replication worker, unable to drain channel.
>> Probable cause: Keep alive socket closed[An existing connection was
>> forcibly
>> closed by the remote host].
>> 26-ene-2009 19:00:53
>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
>> memberDisappeared
>> INFO: Received
>>
>> memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84,
>> 26, 102, -23}:4009,{-84, 26, 102, -23},4009, alive=533282,id={-20 60 112
>> -113 35 -41 71 -5 -124 47 93 -37 117 -9 -9 29 }, payload={}, command={},
>> domain={}, ]] message. Will verify.
>> 26-ene-2009 19:00:54
>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
>> memberDisappeared
>> INFO: Verification complete. Member
>> disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84,
>> 26,
>> 102, -23}:4009,{-84, 26, 102, -23},4009, alive=533282,id={-20 60 112 -113
>> 35
>> -41 71 -5 -124 47 93 -37 117 -9 -9 29 }, payload={}, command={},
>> domain={},
>> ]]
>> 26-ene-2009 19:00:54 org.apache.catalina.ha.tcp.SimpleTcpCluster
>> memberDisappeared
>> INFO: Received member
>> disappeared:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-84,
>> 26,
>> 102, -23}:4009,{-84, 26, 102, -23},4009, alive=533282,id={-20 60 112 -113
>> 35
>> -41 71 -5 -124 47 93 -37 117 -9 -9 29 }, payload={}, command={},
>> domain={},
>> ]
>>
>> Our Cluster config in the nodes  (as you can see, we configured the
>> cluster
>> at engine level) :
>>
>> Node 1:
>>
>> <Engine name="Catalina" defaultHost="localhost" jvmRoute="worker62">
>>
>>      <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
>>
>>
>>        <Manager className="org.apache.catalina.ha.session.DeltaManager"
>>             name="clusterPruebas6"
>>                   stateTransferTimeout="-1"
>>                   expireSessionsOnShutdown="false"
>>                   notifyListenersOnReplication="true"/>
>>
>>        <Channel className="org.apache.catalina.tribes.group.GroupChannel">
>>                <Membership
>> className="org.apache.catalina.tribes.membership.McastService"
>>                        address="228.0.0.9"
>>            bind="172.26.102.233"
>>                        port="45569"
>>                        frequency="1000"
>>                        dropTime="3000"/>
>>            <Receiver
>> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>>                      address="172.26.102.233"
>>                      port="4009"
>>                      autoBind="100"
>>                      selectorTimeout="5000"
>>                      maxThreads="12"/>
>>
>>                <Sender
>> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
>>                  <Transport
>> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
>> />
>>                </Sender>
>>                <Interceptor
>>
>> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
>>                <Interceptor
>>
>> className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
>>          </Channel>
>>
>>
>>
>>    </Cluster>
>> ...
>>
>> Node 2:
>>
>> <Engine name="Catalina" defaultHost="localhost" jvmRoute="worker66">
>>
>>      <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
>>
>>                <Manager
>> className="org.apache.catalina.ha.session.DeltaManager"
>>                                      name="clusterPruebas6"
>>                   stateTransferTimeout="-1"
>>                   expireSessionsOnShutdown="false"
>>                   notifyListenersOnReplication="true"/>
>>
>>                <Channel
>> className="org.apache.catalina.tribes.group.GroupChannel">
>>            <Membership
>> className="org.apache.catalina.tribes.membership.McastService"
>>                        address="228.0.0.9"
>>                        bind="172.26.102.196"
>>                        port="45569"
>>                        frequency="1000"
>>                        dropTime="5000"/>
>>            <Receiver
>> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>>                      address="172.26.102.196"
>>                      port="4009"
>>                      autoBind="100"
>>                      selectorTimeout="100"
>>                      maxThreads="12"/>
>>
>>            <Sender
>> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
>>              <Transport
>> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
>>                                  maxRetryAttempts="0" />
>>            </Sender>
>>            <Interceptor
>>
>> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
>>          <Interceptor
>>
>> className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
>>
>>        </channel>
>>
>>          </Cluster>
>>
>> So, after watching all this info, here go my questions.
>>
>> 1) Considering the high load of our servers, and even when we think that
>> the
>> in memory replication matches our expectatives more than database or in
>> file
>> persistence, is it possible to join a in memory replication or is it
>> discouraged?
>>
>> 2) In our cluster configuration we have been testing with the
>> stateTransferTimeout set to -1 to avoid quitting the DeltaManager
>> getAllClusterSessions, because it is very important for us to replicate
>> all
>> of the sessions to the starting node. Anyway, should we set any other
>> value
>> here?
>>
>> 3) Any other sugestion to our configuration?
>>
>> Thank you very much.
>> Mikel Ibiricu
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat Clustering trouble when starting up under high load

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.

hi Mikel,
when setting a property on the <Manager> you omit the manager. prefix, 
just as you stated.
To do thread dumps with JDK 1.5 under windows, you can use the tanuki 
service wrapper
http://people.apache.org/~fhanik/wrapper.html

The tomcat team might have added that feature to the Tomcat wrapper too, 
you'd have to check

Filip




Mikel Ibiricu wrote:
> Hi Rainer, thanks for your response.
>
> I tried the config you suggested. It suppose the way of configuring those
> parameters is as you said, but skipping "manager." from the begining, as
> when I tried as you said, I got this in the catalina. log:
>
> 10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
> ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
> Setting property 'manager.sendAllSessions' to 'false' did not find a
> matching property.
> 10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
> ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
> Setting property 'manager.sendAllSessionsSize' to '200' did not find a
> matching property.
> 10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
> ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
> Setting property 'manager.sendAllSessionsWaitTime' to '0' did not find a
> matching property.
> 10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
>
> Maybe without the "manager" prefix? sendAllSessions=false,
> sendAllSessionSize="200" and sendAllSessionsWaitTime=""?
>
> Filip, we consider taking thead dumps, but we are using jdk 1.5 under
> windows. I heard it is not supported in Windows until jdk 1.6. Is it true?
>
> Best regards
> Mikel
>
>
>
> 2009/3/9 Rainer Jung <ra...@kippdata.de>
>
>   
>> On 09.03.2009 09:24, Mikel Ibiricu wrote:
>>
>>     
>>> So, It works OK but when starting up one of the nodes with over 500
>>> sessions
>>> alive in the other, it doesn't replicate anything. We assume that it would
>>> not be able to replicate everything... but why it does either replicate
>>> everything or nothing? If it's not able replicate all the sessions, we
>>> would
>>> assume it, but we would like to replicate at least something... Maybe some
>>> trouble with the config we are trying? We tried to limit the
>>> keepAliveCount
>>> of the senders, without improvement with this.
>>>
>>> Reading the DeltaManager code, I have seen the sendAlllSessions parameter.
>>> According to in-line javadoc and the implementation,
>>>
>>> /**
>>>      * handle receive that other node want all sessions ( restart )
>>>      * a) send all sessions with one message
>>>      * b) send session at blocks
>>>      * After sending send state is complete transfered
>>>      * @param msg
>>>      * @param sender
>>>      * @throws IOException
>>>      */
>>>     protected void handleGET_ALL_SESSIONS(SessionMessage msg, Member
>>> sender)
>>> throws IOException {
>>>
>>>         [...]
>>>
>>>         if (isSendAllSessions()) {
>>>             sendSessions(sender, currentSessions, findSessionTimestamp);
>>>         } else {
>>>             // send session at blocks
>>>             int len = currentSessions.length<  getSendAllSessionsSize() ?
>>> currentSessions.length : getSendAllSessionsSize();
>>>             Session[] sendSessions = new Session[len];
>>>             for (int i = 0; i<  currentSessions.length; i +=
>>> getSendAllSessionsSize()) {
>>>                 len = i + getSendAllSessionsSize()>
>>>  currentSessions.length
>>> ? currentSessions.length - i : getSendAllSessionsSize();
>>>                 System.arraycopy(currentSessions, i, sendSessions, 0,
>>> len);
>>>                 sendSessions(sender, sendSessions,findSessionTimestamp);
>>>                 if (getSendAllSessionsWaitTime()>  0) {
>>>                     try {
>>>                         Thread.sleep(getSendAllSessionsWaitTime());
>>>                     } catch (Exception sleep) {
>>>                     }
>>>                 }//end if
>>>             }//for
>>>         }//end if
>>>
>>>         SessionMessage newmsg = new
>>> SessionMessageImpl(name,SessionMessage.EVT_ALL_SESSION_TRANSFERCOMPLETE,
>>> null,"SESSION-STATE-TRANSFERED", "SESSION-STATE-TRANSFERED"+ getName());
>>>         newmsg.setTimestamp(findSessionTimestamp);
>>>         if (log.isDebugEnabled())
>>>
>>> log.debug(sm.getString("deltaManager.createMessage.allSessionTransfered",getName()));
>>>         counterSend_EVT_ALL_SESSION_TRANSFERCOMPLETE++;
>>>         cluster.send(newmsg, sender);
>>>     }
>>>
>>> it may cover our expectatives, so I tried to include
>>> sendAllSessions="false"
>>> in the cluster manager configuration
>>>
>>> <Manager className="org.apache.catalina.ha.session.DeltaManager"
>>>                    name="clusterPruebas6"
>>>                    stateTransferTimeout="180"
>>>                    expireSessionsOnShutdown="false"
>>>                    notifyListenersOnReplication="false"
>>>                    sendAllSessions="false"/>
>>>
>>> But it seems like this parameter is not configurable from the server.xml
>>> file. So, what can I do to force that if it's not posible to replicate all
>>> the sessions, at least replicate something in the startup?
>>>
>>>       
>> If you want to test with sendAllSessions set to false, then add
>>
>> manager.sendAllSessions="false"
>>
>> to the Manager element of the cluster configuration.
>>
>> You might then also want to configure:
>>
>> manager.sendAllSessionsSize: the number of session which will be serialized
>> and send in one chunk (default if sendAllSessions="false": 1000)
>>
>> manager.sendAllSessionsWaitTime: the time in milliseconds between sending
>> out consecutive session chunks (default if sendAllSessions="false": 2000)
>>
>> Regards,
>>
>> Rainer
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
>>     
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: Tomcat Clustering trouble when starting up under high load

Posted by Mikel Ibiricu <jl...@gmail.com>.

Hi Rainer, thanks for your response.

I tried the config you suggested. It suppose the way of configuring those
parameters is as you said, but skipping "manager." from the begining, as
when I tried as you said, I got this in the catalina. log:

10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
Setting property 'manager.sendAllSessions' to 'false' did not find a
matching property.
10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
Setting property 'manager.sendAllSessionsSize' to '200' did not find a
matching property.
10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin
ADVERTENCIA: [SetPropertiesRule]{Server/Service/Engine/Cluster/Manager}
Setting property 'manager.sendAllSessionsWaitTime' to '0' did not find a
matching property.
10-mar-2009 15:58:19 org.apache.tomcat.util.digester.SetPropertiesRule begin

Maybe without the "manager" prefix? sendAllSessions=false,
sendAllSessionSize="200" and sendAllSessionsWaitTime=""?

Filip, we consider taking thead dumps, but we are using jdk 1.5 under
windows. I heard it is not supported in Windows until jdk 1.6. Is it true?

Best regards
Mikel



2009/3/9 Rainer Jung <ra...@kippdata.de>

> On 09.03.2009 09:24, Mikel Ibiricu wrote:
>
>> So, It works OK but when starting up one of the nodes with over 500
>> sessions
>> alive in the other, it doesn't replicate anything. We assume that it would
>> not be able to replicate everything... but why it does either replicate
>> everything or nothing? If it's not able replicate all the sessions, we
>> would
>> assume it, but we would like to replicate at least something... Maybe some
>> trouble with the config we are trying? We tried to limit the
>> keepAliveCount
>> of the senders, without improvement with this.
>>
>> Reading the DeltaManager code, I have seen the sendAlllSessions parameter.
>> According to in-line javadoc and the implementation,
>>
>> /**
>>      * handle receive that other node want all sessions ( restart )
>>      * a) send all sessions with one message
>>      * b) send session at blocks
>>      * After sending send state is complete transfered
>>      * @param msg
>>      * @param sender
>>      * @throws IOException
>>      */
>>     protected void handleGET_ALL_SESSIONS(SessionMessage msg, Member
>> sender)
>> throws IOException {
>>
>>         [...]
>>
>>         if (isSendAllSessions()) {
>>             sendSessions(sender, currentSessions, findSessionTimestamp);
>>         } else {
>>             // send session at blocks
>>             int len = currentSessions.length<  getSendAllSessionsSize() ?
>> currentSessions.length : getSendAllSessionsSize();
>>             Session[] sendSessions = new Session[len];
>>             for (int i = 0; i<  currentSessions.length; i +=
>> getSendAllSessionsSize()) {
>>                 len = i + getSendAllSessionsSize()>
>>  currentSessions.length
>> ? currentSessions.length - i : getSendAllSessionsSize();
>>                 System.arraycopy(currentSessions, i, sendSessions, 0,
>> len);
>>                 sendSessions(sender, sendSessions,findSessionTimestamp);
>>                 if (getSendAllSessionsWaitTime()>  0) {
>>                     try {
>>                         Thread.sleep(getSendAllSessionsWaitTime());
>>                     } catch (Exception sleep) {
>>                     }
>>                 }//end if
>>             }//for
>>         }//end if
>>
>>         SessionMessage newmsg = new
>> SessionMessageImpl(name,SessionMessage.EVT_ALL_SESSION_TRANSFERCOMPLETE,
>> null,"SESSION-STATE-TRANSFERED", "SESSION-STATE-TRANSFERED"+ getName());
>>         newmsg.setTimestamp(findSessionTimestamp);
>>         if (log.isDebugEnabled())
>>
>> log.debug(sm.getString("deltaManager.createMessage.allSessionTransfered",getName()));
>>         counterSend_EVT_ALL_SESSION_TRANSFERCOMPLETE++;
>>         cluster.send(newmsg, sender);
>>     }
>>
>> it may cover our expectatives, so I tried to include
>> sendAllSessions="false"
>> in the cluster manager configuration
>>
>> <Manager className="org.apache.catalina.ha.session.DeltaManager"
>>                    name="clusterPruebas6"
>>                    stateTransferTimeout="180"
>>                    expireSessionsOnShutdown="false"
>>                    notifyListenersOnReplication="false"
>>                    sendAllSessions="false"/>
>>
>> But it seems like this parameter is not configurable from the server.xml
>> file. So, what can I do to force that if it's not posible to replicate all
>> the sessions, at least replicate something in the startup?
>>
>
> If you want to test with sendAllSessions set to false, then add
>
> manager.sendAllSessions="false"
>
> to the Manager element of the cluster configuration.
>
> You might then also want to configure:
>
> manager.sendAllSessionsSize: the number of session which will be serialized
> and send in one chunk (default if sendAllSessions="false": 1000)
>
> manager.sendAllSessionsWaitTime: the time in milliseconds between sending
> out consecutive session chunks (default if sendAllSessions="false": 2000)
>
> Regards,
>
> Rainer
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat Clustering trouble when starting up under high load

Posted by Rainer Jung <ra...@kippdata.de>.

On 09.03.2009 09:24, Mikel Ibiricu wrote:
> So, It works OK but when starting up one of the nodes with over 500 sessions
> alive in the other, it doesn't replicate anything. We assume that it would
> not be able to replicate everything... but why it does either replicate
> everything or nothing? If it's not able replicate all the sessions, we would
> assume it, but we would like to replicate at least something... Maybe some
> trouble with the config we are trying? We tried to limit the keepAliveCount
> of the senders, without improvement with this.
>
> Reading the DeltaManager code, I have seen the sendAlllSessions parameter.
> According to in-line javadoc and the implementation,
>
> /**
>       * handle receive that other node want all sessions ( restart )
>       * a) send all sessions with one message
>       * b) send session at blocks
>       * After sending send state is complete transfered
>       * @param msg
>       * @param sender
>       * @throws IOException
>       */
>      protected void handleGET_ALL_SESSIONS(SessionMessage msg, Member sender)
> throws IOException {
>
>          [...]
>
>          if (isSendAllSessions()) {
>              sendSessions(sender, currentSessions, findSessionTimestamp);
>          } else {
>              // send session at blocks
>              int len = currentSessions.length<  getSendAllSessionsSize() ?
> currentSessions.length : getSendAllSessionsSize();
>              Session[] sendSessions = new Session[len];
>              for (int i = 0; i<  currentSessions.length; i +=
> getSendAllSessionsSize()) {
>                  len = i + getSendAllSessionsSize()>  currentSessions.length
> ? currentSessions.length - i : getSendAllSessionsSize();
>                  System.arraycopy(currentSessions, i, sendSessions, 0, len);
>                  sendSessions(sender, sendSessions,findSessionTimestamp);
>                  if (getSendAllSessionsWaitTime()>  0) {
>                      try {
>                          Thread.sleep(getSendAllSessionsWaitTime());
>                      } catch (Exception sleep) {
>                      }
>                  }//end if
>              }//for
>          }//end if
>
>          SessionMessage newmsg = new
> SessionMessageImpl(name,SessionMessage.EVT_ALL_SESSION_TRANSFERCOMPLETE,
> null,"SESSION-STATE-TRANSFERED", "SESSION-STATE-TRANSFERED"+ getName());
>          newmsg.setTimestamp(findSessionTimestamp);
>          if (log.isDebugEnabled())
> log.debug(sm.getString("deltaManager.createMessage.allSessionTransfered",getName()));
>          counterSend_EVT_ALL_SESSION_TRANSFERCOMPLETE++;
>          cluster.send(newmsg, sender);
>      }
>
> it may cover our expectatives, so I tried to include sendAllSessions="false"
> in the cluster manager configuration
>
> <Manager className="org.apache.catalina.ha.session.DeltaManager"
>                     name="clusterPruebas6"
>                     stateTransferTimeout="180"
>                     expireSessionsOnShutdown="false"
>                     notifyListenersOnReplication="false"
>                     sendAllSessions="false"/>
>
> But it seems like this parameter is not configurable from the server.xml
> file. So, what can I do to force that if it's not posible to replicate all
> the sessions, at least replicate something in the startup?

If you want to test with sendAllSessions set to false, then add

manager.sendAllSessions="false"

to the Manager element of the cluster configuration.

You might then also want to configure:

manager.sendAllSessionsSize: the number of session which will be 
serialized and send in one chunk (default if sendAllSessions="false": 1000)

manager.sendAllSessionsWaitTime: the time in milliseconds between 
sending out consecutive session chunks (default if 
sendAllSessions="false": 2000)

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org