You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Andy Pahne <an...@gmail.com> on 2013/04/04 12:43:47 UTC

problem with clustering

An application that has been running fine for years now suddenly does 
perform with varying results, sometimes as quick as always, but then 
sometimes a simple page request uses up to 30 seconds.

Since the performance did degrade we regularly find log items like the 
following one in catalina.out (many of them, about 100 to 300 per hour 
on each host):

04.04.2013 11:51:53 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector 
memberDisappeared
INFO: Verification complete. Member still 
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 
6, 21}:4000,{-64, -88, 6, 21},4000, alive=1706334,id={-99 120 -58 21 -84 
121 74 45 -104 -73 -123 -40 10 -76 70 59 }, payload={}, command={}, 
domain={}, ]]

We ruled out that the recent changes to said application are the cause 
for the poor performance y simulating all sorts of heavy load on various 
test systems. It just works nicely in the test environment. However, on 
production it does not.

We are using the SimpleTcpCluster solution for clustering on Tomcat 6. 
The cluster has two nodes.

I am NOT suspecting a tomcat bug. And as I said I am not suspecting a 
performance bottleneck in our application or in the db queries it 
performs. At the moment I am thinking of a hardware failure of some kind 
(network interface, router etc.).

Do you have any experience with this problem and what did you do to 
resolve it?

Thanks,
Andy



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: problem with clustering

Posted by Andy Pahne <an...@gmail.com>.
Am 05.04.2013 15:34, schrieb Daniel Mikusa:
>> Am 04.04.2013 15:01, schrieb Daniel Mikusa:
>>
>>
>> The tomcat version is 6.0.18, running on Linux 2.6.24, Java version is 1.6.0_13.
> That's incredibly old, you should look at upgrading ASAP.
>

I know. That's not really my call, unfortunatly.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: problem with clustering

Posted by Daniel Mikusa <dm...@vmware.com>.
On Apr 5, 2013, at 6:02 AM, Andy Pahne wrote:

> Am 04.04.2013 15:01, schrieb Daniel Mikusa:
> 
> 
> The tomcat version is 6.0.18, running on Linux 2.6.24, Java version is 1.6.0_13.

That's incredibly old, you should look at upgrading ASAP.

> 
> 
> 
>> It would be helpful to post your configuration, minus comments, as well as the exact version of Tomcat that you are running.
>> 
> 
> <?xml version='1.0' encoding='utf-8'?>
> <Server port="8005" shutdown="SHUTDOWN">
> 
>  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
>  <Listener className="org.apache.catalina.core.JasperListener" />
>  <Listener className="org.apache.catalina.mbeans.ServerLifecycleListener" />
>  <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />
> 
>  <GlobalNamingResources>
> 
>    <Resource name="UserDatabase" auth="Container"
>              type="org.apache.catalina.UserDatabase"
>              description="User database that can be updated and saved"
> factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
>              pathname="conf/tomcat-users.xml" />
>  </GlobalNamingResources>
> 
>  <Service name="Catalina">
> 
>    <Connector port="8090" protocol="HTTP/1.1"
>               connectionTimeout="20000"
>               redirectPort="8443" />
> 
>    <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />
> 
>    <Engine name="Catalina" defaultHost="localhost">
>      <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>

While this will work, you can't customize any of the cluster configuration settings when you use it.  You might want to look at replacing this with the expanded XML for a cluster setup.  It's a lot more XML, but it gives you much more control over your configuration.  See the following link which shows what the above element is equivalent to.  

  https://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html#For_the_impatient

Once you make the switch, you can try adjusting the "dropTime" and "frequency" settings like I mentioned in my previous email.  You'd set them on the <Membership/> element.

  https://tomcat.apache.org/tomcat-6.0-doc/config/cluster-membership.html

Dan



>      <Realm className="org.apache.catalina.realm.UserDatabaseRealm"
>             resourceName="UserDatabase"/>
>      <Host name="localhost"  appBase="webapps"
>            unpackWARs="true" autoDeploy="true"
>            xmlValidation="false" xmlNamespaceAware="false">
>      </Host>
>    </Engine>
>  </Service>
> </Server>
> 
> 
> 
>> If you suspect a network issue, you could try monitoring with Wireshark or tcpdump to capture the network packets.  Analysis of the packets could show if there is a problem.  Another option would be to try and use a tool like iperf to put a high load on your network and possibly trigger the problem.
>> 
>> Dan
>> 
>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: problem with clustering

Posted by Andy Pahne <an...@gmail.com>.
Am 04.04.2013 15:01, schrieb Daniel Mikusa:


The tomcat version is 6.0.18, running on Linux 2.6.24, Java version is 
1.6.0_13.



> It would be helpful to post your configuration, minus comments, as well as the exact version of Tomcat that you are running.
>

<?xml version='1.0' encoding='utf-8'?>
<Server port="8005" shutdown="SHUTDOWN">

   <Listener className="org.apache.catalina.core.AprLifecycleListener" 
SSLEngine="on" />
   <Listener className="org.apache.catalina.core.JasperListener" />
   <Listener 
className="org.apache.catalina.mbeans.ServerLifecycleListener" />
   <Listener 
className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />

   <GlobalNamingResources>

     <Resource name="UserDatabase" auth="Container"
               type="org.apache.catalina.UserDatabase"
               description="User database that can be updated and saved"
factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
               pathname="conf/tomcat-users.xml" />
   </GlobalNamingResources>

   <Service name="Catalina">

     <Connector port="8090" protocol="HTTP/1.1"
                connectionTimeout="20000"
                redirectPort="8443" />

     <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

     <Engine name="Catalina" defaultHost="localhost">
       <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>
       <Realm className="org.apache.catalina.realm.UserDatabaseRealm"
              resourceName="UserDatabase"/>
       <Host name="localhost"  appBase="webapps"
             unpackWARs="true" autoDeploy="true"
             xmlValidation="false" xmlNamespaceAware="false">
       </Host>
     </Engine>
   </Service>
</Server>



> If you suspect a network issue, you could try monitoring with Wireshark or tcpdump to capture the network packets.  Analysis of the packets could show if there is a problem.  Another option would be to try and use a tool like iperf to put a high load on your network and possibly trigger the problem.
>
> Dan
>
>
>


Re: problem with clustering

Posted by Daniel Mikusa <dm...@vmware.com>.
On Apr 4, 2013, at 6:43 AM, Andy Pahne wrote:

> 
> An application that has been running fine for years now suddenly does perform with varying results, sometimes as quick as always, but then sometimes a simple page request uses up to 30 seconds.

If you haven't changed anything with the application or your Tomcat configuration, then you'll want to look at the external resources that your application depends upon, such as a database, the network, shared file systems, etc…  If the performance of an external resource is suffering, it could definitely be causing problems for your application.


> 
> Since the performance did degrade we regularly find log items like the following one in catalina.out (many of them, about 100 to 300 per hour on each host):
> 
> 04.04.2013 11:51:53 org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
> INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 6, 21}:4000,{-64, -88, 6, 21},4000, alive=1706334,id={-99 120 -58 21 -84 121 74 45 -104 -73 -123 -40 10 -76 70 59 }, payload={}, command={}, domain={}, ]]

I think that you'll typically see these when there is a network issue, but you would see them anytime a member is timed out.

The connections between the nodes in your cluster are monitored with a heartbeat.  When a node doesn't respond to the heartbeat the node is considered to have left the cluster.  To protect against false positives you can configure a TcpFailureDetector.  This listens for "memberDisappeared" events and when one occurs, it will connect to the member via TCP to try to confirm it's disappearance.  

In your case, the message that you are seeing is indicating that the heartbeat failed, but that the TcpFailureDetector was able to verify the node still exists.  In other words, this is a false positive.

In addition to the TcpFailureDetector, you can also adjust the "frequency" and "dropTime" attributes to control how often heartbeats are sent and how long to wait for the response.  You might try adjusting these settings to make the configuration more tolerant of your network.

  https://tomcat.apache.org/tomcat-6.0-doc/config/cluster-membership.html


> We ruled out that the recent changes to said application are the cause for the poor performance y simulating all sorts of heavy load on various test systems. It just works nicely in the test environment. However, on production it does not.
> 
> We are using the SimpleTcpCluster solution for clustering on Tomcat 6. The cluster has two nodes.

It would be helpful to post your configuration, minus comments, as well as the exact version of Tomcat that you are running.


> 
> I am NOT suspecting a tomcat bug. And as I said I am not suspecting a performance bottleneck in our application or in the db queries it performs. At the moment I am thinking of a hardware failure of some kind (network interface, router etc.).
> 
> Do you have any experience with this problem and what did you do to resolve it?

If you suspect a network issue, you could try monitoring with Wireshark or tcpdump to capture the network packets.  Analysis of the packets could show if there is a problem.  Another option would be to try and use a tool like iperf to put a high load on your network and possibly trigger the problem.

Dan



> 
> Thanks,
> Andy
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org