You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Jimmy Phillips <ji...@yahoo.com> on 2009/04/02 17:20:58 UTC

tomcat 6 session replication issues

Hi,

I've been having issues with tomcat session replication. I
have a number of tomcat servers running in a cluster mode, behind an Apache load
balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the
cluster using the DeltaManager seems to be working fine, however when I
try to use the BackupManager for session replication, I get the
following entries in the logs:

Apr 1, 2009 3:28:42 AM org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING:
Channel key is registered, but has had no interest ops for the last
3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@62af9d74 last
access:2009-04-01 03:28:35.969
Apr 1, 2009 3:28:42 AM org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@4c4947d3 last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO:
Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10,
99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7
81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={},
domain={}, ]] message. Will verify.
Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO:
Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99,
86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81
-1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={},
domain={}, ]]
Apr 1, 2009 3:29:04 AM org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat
SEVERE: Unable to send AbstractReplicatedMap.ping message
org.apache.catalina.tribes.ChannelException: Operation has timed out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000;
        at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97)
        at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53)
        at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80)
        at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:78)
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
        at org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.sendMessage(ThroughputInterceptor.java:61)
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
       
at
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:73)
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
        at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:87)
        at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
        at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216)
        at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175)
        at org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89)
        at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractReplicatedMap.java:253)
        at org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(AbstractReplicatedMap.java:793)
        at org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.java:153)
        at org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupChannel.java:661)

Of course the above entry is just one of many, for the different hosts. Searching the mailing lists, I found this post http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same problem I am having. The outcome of that
thread states that the problem is fixed by a patch in revision 618823,
so I compiled a version of the current 6.x trunk (rev 759722) and
deployed it to all the servers. However, the problem is still
appearing. I've attached a copy of the current server.xml ( it is
common to all tomcat instances ).

I've done a thread dump
on one of the servers when these errors started appearing, and the
output is attached, thread_dump.txt (removed threads that were running
by our application).

This problem is reproducable each time I restart the servers. At this stage, I'm clueless on what to try next, so I'm looking forward to your replies.


Regards,
Jim.

Attached: server.xml, thread_dump.txt


      

Re: tomcat 6 session replication issues

Posted by Filip Hanik - Dev Lists <de...@hanik.com>.
Apr 1, 2009 3:28:42 AM 
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the 
last 3000 ms.

this is a sign of a thread being stuck, and its confirmed by the session 
transfer timing out.
Do a thread dump on the server where you are seeing this message, that 
will tell us where your system is stuck

Filip

Jimmy Phillips wrote:
> Hi,
>
> I've been having issues with tomcat session replication. I have a 
> number of tomcat servers running in a cluster mode, behind an Apache 
> load balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the 
> cluster using the DeltaManager seems to be working fine, however when 
> I try to use the BackupManager for session replication, I get the 
> following entries in the logs:
>
> Apr 1, 2009 3:28:42 AM 
> org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
> WARNING: Channel key is registered, but has had no interest ops for 
> the last 3000 ms. 
> (cancelled:false):sun.nio.ch.SelectionKeyImpl@62af9d74 last 
> access:2009-04-01 03:28:35.969
> Apr 1, 2009 3:28:42 AM 
> org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
> WARNING: Channel key is registered, but has had no interest ops for 
> the last 3000 ms. 
> (cancelled:false):sun.nio.ch.SelectionKeyImpl@4c4947d3 last 
> access:2009-04-01 03:28:35.969
> Apr 1, 2009 3:29:04 AM 
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector 
> memberDisappeared
> INFO: Received 
> memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 
> 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 
> -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={}, 
> domain={}, ]] message. Will verify.
> Apr 1, 2009 3:29:04 AM 
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector 
> memberDisappeared
> INFO: Verification complete. Member still 
> alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99, 
> 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81 
> -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={}, 
> domain={}, ]]
> Apr 1, 2009 3:29:04 AM 
> org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat
> SEVERE: Unable to send AbstractReplicatedMap.ping message
> org.apache.catalina.tribes.ChannelException: Operation has timed 
> out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000;
>         at 
> org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97)
>         at 
> org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53)
>         at 
> org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80)
>         at 
> org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:78)
>         at 
> org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at 
> org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.sendMessage(ThroughputInterceptor.java:61)
>         at 
> org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at 
> org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:73)
>         at 
> org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at 
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:87)
>         at 
> org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at 
> org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216)
>         at 
> org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175)
>         at 
> org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89)
>         at 
> org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractReplicatedMap.java:253)
>         at 
> org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(AbstractReplicatedMap.java:793)
>         at 
> org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.java:153)
>         at 
> org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupChannel.java:661)
>
> Of course the above entry is just one of many, for the different 
> hosts. Searching the mailing lists, I found this post 
> http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same 
> problem I am having. The outcome of that thread states that the 
> problem is fixed by a patch in revision 618823, so I compiled a 
> version of the current 6.x trunk (rev 759722) and deployed it to all 
> the servers. However, the problem is still appearing. I've attached a 
> copy of the current server.xml ( it is common to all tomcat instances ).
>
> I've done a thread dump on one of the servers when these errors 
> started appearing, and the output is attached, thread_dump.txt 
> (removed threads that were running by our application).
>
> This problem is reproducable each time I restart the servers. At this 
> stage, I'm clueless on what to try next, so I'm looking forward to 
> your replies.
>
>
> Regards,
> Jim.
>
> Attached: server.xml, thread_dump.txt
>
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: tomcat 6 session replication issues

Posted by Jimmy Phillips <ji...@yahoo.com>.
The multicast address is in the <Membership/> element. The 10x address in the <Receiver/> element is actually a 10x network. 




________________________________
From: Jorge Medina <jm...@e-dialog.com>
To: Tomcat Users List <us...@tomcat.apache.org>
Sent: Thursday, April 2, 2009 4:31:06 PM
Subject: RE: tomcat 6 session replication issues

What is your multicast address and port used by Tomcat to discover
members of the cluster?

Your sever.xml has a note [10.x.x.x]. This does not look like a
multicast address. 

http://tldp.org/HOWTO/Multicast-HOWTO-2.html




________________________________

From: Jimmy Phillips [mailto:jimmy.phillips83@yahoo.com] 
Sent: Thursday, April 02, 2009 11:21 AM
To: users@tomcat.apache.org
Subject: tomcat 6 session replication issues


Hi,

I've been having issues with tomcat session replication. I have a number
of tomcat servers running in a cluster mode, behind an Apache load
balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the
cluster using the DeltaManager seems to be working fine, however when I
try to use the BackupManager for session replication, I get the
following entries in the logs:

Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@62af9d74
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@4c4947d3
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/
/{10, 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25
-2 -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={},
command={}, domain={}, ]] message. Will verify.
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99,
86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81
-1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={},
domain={}, ]]
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat
SEVERE: Unable to send AbstractReplicatedMap.ping message
org.apache.catalina.tribes.ChannelException: Operation has timed
out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000;
        at
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(P
arallelNioSender.java:97)
        at
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessag
e(PooledParallelSender.java:53)
        at
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(
ReplicationTransmitter.java:80)
        at
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelC
oordinator.java:78)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.send
Message(ThroughputInterceptor.java:61)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor
..sendMessage(MessageDispatchInterceptor.java:73)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMes
sage(TcpFailureDetector.java:87)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216
)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175
)
        at
org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractRepl
icatedMap.java:253)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(Abstrac
tReplicatedMap.java:793)
        at
org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.jav
a:153)
        at
org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupC
hannel.java:661)

Of course the above entry is just one of many, for the different hosts.
Searching the mailing lists, I found this post
http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same
problem I am having. The outcome of that thread states that the problem
is fixed by a patch in revision 618823, so I compiled a version of the
current 6.x trunk (rev 759722) and deployed it to all the servers.
However, the problem is still appearing. I've attached a copy of the
current server.xml ( it is common to all tomcat instances ).

I've done a thread dump on one of the servers when these errors started
appearing, and the output is attached, thread_dump.txt (removed threads
that were running by our application).

This problem is reproducable each time I restart the servers. At this
stage, I'm clueless on what to try next, so I'm looking forward to your
replies.


Regards,
Jim.

Attached: server.xml, thread_dump.txt


      

Re: tomcat 6 session replication issues

Posted by Jimmy Phillips <ji...@yahoo.com>.
I had a look at the Cluster Receiver object reference, and I'm pretty sure it must be the local address where to listen to incoming data. Since the multicast route is set on the eth1 interface, I use the relative IP address (10.x). 

>From the documentation:

address: The address (network interface) to listen for incoming traffic. Same as the bind address. The default value is auto and translates to java.net.InetAddress.getLocalHost().getHostAddress(). 

Any other pointers?
Jim



________________________________
From: Jorge Medina <jm...@e-dialog.com>
To: Tomcat Users List <us...@tomcat.apache.org>
Sent: Thursday, April 2, 2009 4:31:06 PM
Subject: RE: tomcat 6 session replication issues

What is your multicast address and port used by Tomcat to discover
members of the cluster?

Your sever.xml has a note [10.x.x.x]. This does not look like a
multicast address. 

http://tldp.org/HOWTO/Multicast-HOWTO-2.html




________________________________

From: Jimmy Phillips [mailto:jimmy.phillips83@yahoo.com] 
Sent: Thursday, April 02, 2009 11:21 AM
To: users@tomcat.apache.org
Subject: tomcat 6 session replication issues


Hi,

I've been having issues with tomcat session replication. I have a number
of tomcat servers running in a cluster mode, behind an Apache load
balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the
cluster using the DeltaManager seems to be working fine, however when I
try to use the BackupManager for session replication, I get the
following entries in the logs:

Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@62af9d74
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@4c4947d3
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/
/{10, 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25
-2 -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={},
command={}, domain={}, ]] message. Will verify.
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99,
86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81
-1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={},
domain={}, ]]
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat
SEVERE: Unable to send AbstractReplicatedMap.ping message
org.apache.catalina.tribes.ChannelException: Operation has timed
out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000;
        at
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(P
arallelNioSender.java:97)
        at
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessag
e(PooledParallelSender.java:53)
        at
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(
ReplicationTransmitter.java:80)
        at
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelC
oordinator.java:78)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.send
Message(ThroughputInterceptor.java:61)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor
..sendMessage(MessageDispatchInterceptor.java:73)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMes
sage(TcpFailureDetector.java:87)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216
)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175
)
        at
org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractRepl
icatedMap.java:253)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(Abstrac
tReplicatedMap.java:793)
        at
org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.jav
a:153)
        at
org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupC
hannel.java:661)

Of course the above entry is just one of many, for the different hosts.
Searching the mailing lists, I found this post
http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same
problem I am having. The outcome of that thread states that the problem
is fixed by a patch in revision 618823, so I compiled a version of the
current 6.x trunk (rev 759722) and deployed it to all the servers.
However, the problem is still appearing. I've attached a copy of the
current server.xml ( it is common to all tomcat instances ).

I've done a thread dump on one of the servers when these errors started
appearing, and the output is attached, thread_dump.txt (removed threads
that were running by our application).

This problem is reproducable each time I restart the servers. At this
stage, I'm clueless on what to try next, so I'm looking forward to your
replies.


Regards,
Jim.

Attached: server.xml, thread_dump.txt


      

RE: tomcat 6 session replication issues

Posted by Jorge Medina <jm...@e-dialog.com>.
What is your multicast address and port used by Tomcat to discover
members of the cluster?
 
Your sever.xml has a note [10.x.x.x]. This does not look like a
multicast address. 
 
http://tldp.org/HOWTO/Multicast-HOWTO-2.html
 
 
 

________________________________

From: Jimmy Phillips [mailto:jimmy.phillips83@yahoo.com] 
Sent: Thursday, April 02, 2009 11:21 AM
To: users@tomcat.apache.org
Subject: tomcat 6 session replication issues


Hi,

I've been having issues with tomcat session replication. I have a number
of tomcat servers running in a cluster mode, behind an Apache load
balancer. The tomcat version is 6.0.18 on CentOS 5.1. Running the
cluster using the DeltaManager seems to be working fine, however when I
try to use the BackupManager for session replication, I get the
following entries in the logs:

Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@62af9d74
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:28:42 AM
org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
WARNING: Channel key is registered, but has had no interest ops for the
last 3000 ms. (cancelled:false):sun.nio.ch.SelectionKeyImpl@4c4947d3
last access:2009-04-01 03:28:35.969
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/
/{10, 99, 86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25
-2 -7 81 -1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={},
command={}, domain={}, ]] message. Will verify.
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
memberDisappeared
INFO: Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 99,
86, 47}:4000,{10, 99, 86, 47},4000, alive=1380182,id={-121 25 -2 -7 81
-1 76 3 -92 -20 122 69 67 102 -31 -15 }, payload={}, command={},
domain={}, ]]
Apr 1, 2009 3:29:04 AM
org.apache.catalina.tribes.tipis.AbstractReplicatedMap heartbeat
SEVERE: Unable to send AbstractReplicatedMap.ping message
org.apache.catalina.tribes.ChannelException: Operation has timed
out(60000 ms.).; Faulty members:tcp://{10, 99, 86, 47}:4000;
        at
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(P
arallelNioSender.java:97)
        at
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessag
e(PooledParallelSender.java:53)
        at
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(
ReplicationTransmitter.java:80)
        at
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelC
oordinator.java:78)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.send
Message(ThroughputInterceptor.java:61)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor
.sendMessage(MessageDispatchInterceptor.java:73)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMes
sage(TcpFailureDetector.java:87)
        at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(Chan
nelInterceptorBase.java:75)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216
)
        at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175
)
        at
org.apache.catalina.tribes.group.RpcChannel.send(RpcChannel.java:89)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.ping(AbstractRepl
icatedMap.java:253)
        at
org.apache.catalina.tribes.tipis.AbstractReplicatedMap.heartbeat(Abstrac
tReplicatedMap.java:793)
        at
org.apache.catalina.tribes.group.GroupChannel.heartbeat(GroupChannel.jav
a:153)
        at
org.apache.catalina.tribes.group.GroupChannel$HeartbeatThread.run(GroupC
hannel.java:661)

Of course the above entry is just one of many, for the different hosts.
Searching the mailing lists, I found this post
http://markmail.org/message/jv4dykh7fdhr4mvp which looks like the same
problem I am having. The outcome of that thread states that the problem
is fixed by a patch in revision 618823, so I compiled a version of the
current 6.x trunk (rev 759722) and deployed it to all the servers.
However, the problem is still appearing. I've attached a copy of the
current server.xml ( it is common to all tomcat instances ).

I've done a thread dump on one of the servers when these errors started
appearing, and the output is attached, thread_dump.txt (removed threads
that were running by our application).

This problem is reproducable each time I restart the servers. At this
stage, I'm clueless on what to try next, so I'm looking forward to your
replies.


Regards,
Jim.

Attached: server.xml, thread_dump.txt