You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by Olve Hansen <ol...@intermedia.uib.no> on 2005/11/08 11:59:54 UTC

session replication errors

Hi,
First I must congratulate on the improvements of the clustering docs!
Now I don't have to guess nearly as much as before when trying to set up
clustering. :-)

But I still have some troubles regarding clustering.
We have a webapp clustered over three tomcats, all on the same machine,
load balanced by mod_jk in an Apache frontend, using sticky lb, session
replication by fastasyncqueue. 

It seems OK - no session dropouts in sticky mode, we can turn on and off
tomcats as we please.

But, in the logs, we see numerous:
SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
Connection reset by peer' closing channel
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
at
o.a.c.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
at
o.a.c.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)

I am trying to track these down, as they are marked as SEVERE. The
tomcats sits on a Red Hat Linux server. It is configured with two
networks, one open to the world, and firewalled, the other is internal,
and is fully open. The internal net is used for clustering, db-access,
and mod_jk connections.

How severe are these? Can they be a result of two near concurrent
requests, and the second request cancels the serialisation of the first
session object? If not, where should I start to look?

Log files and cluster setup follows:

cluster part of server.xml (only tcpListenPort differs among our
configurations):
=================================================================================
<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
doClusterLog="true"
clusterLogName="clusterlog"
expireSessionsOnShutdown="false"
useDirtyFlag="true"
notifyListenersOnReplication="true"
notifySessionListenersOnReplication="true">

<Membership
className="org.apache.catalina.cluster.mcast.McastService"
mcastAddr="228.0.0.4"
mcastPort="45564"
mcastFrequency="500"
mcastDropTime="3000"/>

<Receiver
className="org.apache.catalina.cluster.tcp.ReplicationListener"
tcpListenAddress="192.168.5.112"
tcpListenPort="4003"
tcpSelectorTimeout="100"
tcpThreadCount="2"/>

<Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
replicationMode="fastasyncqueue"
ackTimeout="15000"/>

<Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*
\.txt;"/>

<Deployer className="org.apache.catalina.cluster.deploy.FarmWarDeployer"
tempDir="${catalina.base}/war-temp"
deployDir="${catalina.base}/webapps/"
watchDir="${catalina.base}/war-listen/"
watchEnabled="false"/>
	
<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
	
<Valve
className="org.apache.catalina.cluster.session.JvmRouteBinderValve"
enabled="true"/>
	
<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
<ClusterListener
className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener"/>
</Cluster>


Logs demonstrating the steps toward exceptions:
=================================================================================
#######Request #1, trying to access  home page, redirected to login
page:

#######Access log from apache:
i.p.add.ress - - [07/Nov/2005:11:11:00 +0100] "GET /npweb/t-app
HTTP/1.1" 302 -
i.p.add.ress - - [07/Nov/2005:11:11:00 +0100]
"GET /npweb/t-app?service=external/bitecore.loginPage HTTP/1.1" 200 4633
i.p.add.ress - - [07/Nov/2005:11:11:00 +0100] "GET /favicon.ico
HTTP/1.1" 404 288




#######Tail from tomcat logs:
==> tomcat1/logs/catalina.out <==
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:11:00 AM 1 -
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260068
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:11:00 AM 0 -
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260072
[np-norway] INFO [TP-Processor29]
BiteEngine.registerTestFormatConfigs(91) | registering all test format
configs
[np-norway] INFO [TP-Processor29]
BiteEngine.registerTestFormatConfigs(108) | TOTALLY REGISTERED 12 TEST
FORMAT CONFIGS IN SYSTEM
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:11:00 AM 0 -
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260109

==> tomcat2/logs/catalina.out <==
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 0 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260068 true
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 1 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260072 true
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 0 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260109 true

==> tomcat3/logs/catalina.out <==
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 0 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260068 true
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 1 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260072 true
Nov 7, 2005 11:11:00 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:11:00 AM 1 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358260109 true


#######Request #2 (after login):

#######Access log from apache:
i.p.add.ress - - [07/Nov/2005:11:12:03 +0100]
"POST /npweb/j_acegi_security_check HTTP/1.1" 302 -
i.p.add.ress - - [07/Nov/2005:11:12:03 +0100] "GET /npweb/t-app
HTTP/1.1" 200 6454
i.p.add.ress - - [07/Nov/2005:11:12:05 +0100] "GET /favicon.ico
HTTP/1.1" 404 288

#######Tail from tomcat logs:
==> tomcat1/logs/catalina.out <==
[np-norway] INFO [TP-Processor29] LoggerListener.onApplicationEvent(125)
| Authentication success for user: UserName; details:
net.sf.acegisecurity.ui.WebAuthenticationDetails@107513d:
RemoteIpAddress: i.p.add.ress; SessionId:
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc1
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:12:03 AM 2 -
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358323369


==> tomcat2/logs/catalina.out <==
Nov 7, 2005 11:12:03 AM
org.apache.catalina.cluster.tcp.TcpReplicationThread run
SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
Connection reset by peer' closing channel
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:03 AM 0 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358323369 true
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:12:03 AM 2 -
55AB686217BC200D0D6F026EEC3FA5C7.bserv2tc2-EXPIRED-MSG

==> tomcat3/logs/catalina.out <==
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:03 AM 1 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358323369 true
Nov 7, 2005 11:12:03 AM
org.apache.catalina.cluster.tcp.TcpReplicationThread run
SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
Connection reset by peer' closing channel
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:03 AM 1 192.168.5.112:4,002
55AB686217BC200D0D6F026EEC3FA5C7.bserv2tc2-EXPIRED-MSG true

==> tomcat1/logs/catalina.out <==
Nov 7, 2005 11:12:03 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:03 AM 0 192.168.5.112:4,002
55AB686217BC200D0D6F026EEC3FA5C7.bserv2tc2-EXPIRED-MSG true
Nov 7, 2005 11:12:04 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 7, 2005:11:12:04 AM 0 -
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358324900

==> tomcat2/logs/catalina.out <==
Nov 7, 2005 11:12:04 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:04 AM 1 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358324900 true

==> tomcat3/logs/catalina.out <==
Nov 7, 2005 11:12:04 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 7, 2005:11:12:04 AM 0 192.168.5.112:4,001
B07DBFF28ACB2BAC0F28FE9DAC65A69B.bserv2tc11131358324900 true

Re: session replication errors

Posted by Peter Rossbach <pr...@objektpark.de>.

Hey Olve,

what I mean with your Cluster Config is:
    You have register two ClusterSessionListener!

>>>>>>
<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
<Valve
className="org.apache.catalina.cluster.session.JvmRouteBinderValve"
enabled="true" />	

<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener" />
<ClusterListener
className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener" />

<<<<

Peter

Peter Rossbach schrieb:

> Hey Olve,
>
> I found a the NPE shutdown problem, but I must setup my Linux System
> to look inside the strange timeout exception. I want remove the 
> waitForAck flag
> at the next release, but my time is currently a little bit limited. 
> Please look and test
> my next cluster checkin .-)
>
> Thanks for reporting the bug.
> Peter
>
> Olve Hansen schrieb:
>
>> Hey, thanks for your fixes, sadly it didn't fix my problem. That is if I
>> am doing things right, which there are no guarantees for. I will
>> continue to hunt down the problem(s).
>> If I configure my cluster as you suggested in your first mail, the
>> Exceptions disappears, if I use my original config it is as it were. I
>> would believe that a valid configuration shouldn't make tomcat throw
>> severe exceptions... I also started experiencing shutdown exceptions
>> from the svn version. 5.5.12 didn't have these. (stacktrace can be found
>> in bottom of mail).
>>
>> Are there any gotchas for using waitForAck="true" and
>> keepAliveTimeout="-1"? For example less performance? I guess that
>> waiting for acks from the whole cluster can be lengthy, especially if we
>> want to use pooled replication?
>>
>> You also said in an earlier mail that I didn't need both the
>> ClusterSessionListener, and the JvmRouteSessionIDBinderListener, why is
>> that? And which do I need? And why are both listed in the example set-up
>> in clusterdocs, if only one is needed?
>>
>> Now for the problem (sorry for the length of this mail).
>>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>
>
>





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

RE: session replication errors

Posted by Richard Mixon <rn...@qwest.net>.

And does this have to do with "session replication errors"?

At least when hijacking a thread remember to change the Subject line. But
better still create an entirely new email and address it to the list. This
will make it easier on those of us that use threaded mail readers.

Concerning your question. Are you using the Sysdeo Tomcat plugin or WTP? If
using Sysdeo, I am using Eclipse 3.1 and Sysdeo and its works fine either
way.

Please also tell the operating system and version and Java version - it
might help.

HTH - Richard

> -----Original Message-----
> From: Kanin Chen [mailto:pujiang.wang@gmail.com] 
> Sent: Thursday, November 17, 2005 3:33 AM
> To: Tomcat Users List
> Subject: Re: session replication errors
> 
> Howdy,
>  I am using Eclipse 3.1 and TC 5.5.12, when starting TC 
> 5.5.12 in Eclipse, the manage webapp is disabled while 
> starting outside Eclipse is valid.
>  The server.xml in eclipse is all the same to server.xml in TC.
> any hint?
>  Thanks in advance.
> Best Regards.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: session replication errors

Posted by Kanin Chen <pu...@gmail.com>.

Howdy,
 I am using Eclipse 3.1 and TC 5.5.12, when starting TC 5.5.12 in Eclipse,
the manage webapp is disabled while starting outside Eclipse is valid.
 The server.xml in eclipse is all the same to server.xml in TC.
any hint?
 Thanks in advance.
Best Regards.

Re: session replication errors

Posted by Peter Rossbach <pr...@objektpark.de>.

Hey Olve,

I found a the NPE shutdown problem, but I must setup my Linux System
to look inside the strange timeout exception. I want remove the 
waitForAck flag
at the next release, but my time is currently a little bit limited. 
Please look and test
my next cluster checkin .-)

Thanks for reporting the bug.
Peter

Olve Hansen schrieb:

>Hey, thanks for your fixes, sadly it didn't fix my problem. That is if I
>am doing things right, which there are no guarantees for. I will
>continue to hunt down the problem(s). 
>
>If I configure my cluster as you suggested in your first mail, the
>Exceptions disappears, if I use my original config it is as it were. I
>would believe that a valid configuration shouldn't make tomcat throw
>severe exceptions... I also started experiencing shutdown exceptions
>from the svn version. 5.5.12 didn't have these. (stacktrace can be found
>in bottom of mail).
>
>Are there any gotchas for using waitForAck="true" and
>keepAliveTimeout="-1"? For example less performance? I guess that
>waiting for acks from the whole cluster can be lengthy, especially if we
>want to use pooled replication?
>
>You also said in an earlier mail that I didn't need both the
>ClusterSessionListener, and the JvmRouteSessionIDBinderListener, why is
>that? And which do I need? And why are both listed in the example set-up
>in clusterdocs, if only one is needed?
>
>Now for the problem (sorry for the length of this mail).
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: session replication errors

Posted by Olve Hansen <ol...@intermedia.uib.no>.

Hey, thanks for your fixes, sadly it didn't fix my problem. That is if I
am doing things right, which there are no guarantees for. I will
continue to hunt down the problem(s). 

If I configure my cluster as you suggested in your first mail, the
Exceptions disappears, if I use my original config it is as it were. I
would believe that a valid configuration shouldn't make tomcat throw
severe exceptions... I also started experiencing shutdown exceptions
from the svn version. 5.5.12 didn't have these. (stacktrace can be found
in bottom of mail).

Are there any gotchas for using waitForAck="true" and
keepAliveTimeout="-1"? For example less performance? I guess that
waiting for acks from the whole cluster can be lengthy, especially if we
want to use pooled replication?

You also said in an earlier mail that I didn't need both the
ClusterSessionListener, and the JvmRouteSessionIDBinderListener, why is
that? And which do I need? And why are both listed in the example set-up
in clusterdocs, if only one is needed?

Now for the problem (sorry for the length of this mail).

My setup is:
The latest sources (as of 08:30 this morning, GMT+1 - 15/11-05) 
I have split tomcat in CATALINA_HOME (the compiled one) and two
CATALINA_BASE: cluster1, and cluster2, jvmroute set accordingly. 

The CATALINA_BASE folders have webapps/servlet-examples from 5.5.12,
where I have modified the session example to be distributable.
For first tests I use this configuration:
<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
doClusterLog="true" clusterLogName="clusterlog"
expireSessionsOnShutdown="false" useDirtyFlag="true"
notifyListenersOnReplication="true"
notifySessionListenersOnReplication="true">

<Membership 
className="org.apache.catalina.cluster.mcast.McastService"
mcastAddr="228.0.0.4"
mcastPort="45564"
mcastFrequency="500"
mcastDropTime="3000"/>

<Receiver 
className="org.apache.catalina.cluster.tcp.ReplicationListener"
tcpListenAddress="auto"
tcpListenPort="4001"
tcpSelectorTimeout="100"
tcpThreadCount="2"/>

<Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
replicationMode="fastasyncqueue"
ackTimeout="15000"/>

<Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*
\.txt;"/>
                   
<Deployer className="org.apache.catalina.cluster.deploy.FarmWarDeployer"
tempDir="${catalina.base}/war-temp"
deployDir="${catalina.base}/webapps/"
watchDir="${catalina.base}/war-listen/"
watchEnabled="false"/>

<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
<Valve
className="org.apache.catalina.cluster.session.JvmRouteBinderValve"
enabled="true" />	

<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener" />
<ClusterListener
className="org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener" />
        </Cluster>

When setting value 'asdf' in session in cluster1, and then switching to
see the same servlet in cluster2, the replicated values appear ok, but
if I wait some time (not doing any requests), I get the exceptions:

==> cluster1/logs/catalina.out <==
Nov 15, 2005 9:35:42 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 15, 2005:9:35:42 AM 1 -
BA25CFCE25F1E939EEF6F559D0A721B1.cluster1-1132043742511

==> cluster1/logs/localhost.2005-11-15.log <==
Nov 15, 2005 9:35:42 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener:
attributeAdded('BA25CFCE25F1E939EEF6F559D0A721B1.cluster1', 'asdf', '')
Nov 15, 2005 9:35:42 AM org.apache.catalina.core.ApplicationContext log
INFO: InvokerFilter(ApplicationFilterConfig[name=Path Mapped Filter,
filterClass=filters.ExampleFilter]): 7 milliseconds

==> cluster2/logs/catalina.out <==
Nov 15, 2005 9:35:42 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 15, 2005:9:35:42 AM 15 127.0.0.1:4,001
BA25CFCE25F1E939EEF6F559D0A721B1.cluster1-1132043742511 true

==> cluster2/logs/localhost.2005-11-15.log <==
Nov 15, 2005 9:35:42 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener:
attributeAdded('BA25CFCE25F1E939EEF6F559D0A721B1.cluster1', 'asdf', '')
Nov 15, 2005 9:35:42 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener:
attributeReplaced('BA25CFCE25F1E939EEF6F559D0A721B1.cluster1', 'asdf',
'')

==> cluster1/logs/catalina.out <==
Nov 15, 2005 9:36:05 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logReceiveMessage
INFO: RECEIVE Nov 15, 2005:9:36:05 AM 1 127.0.0.1:4,002
BA25CFCE25F1E939EEF6F559D0A721B1.cluster1-1132043765344 true

==> cluster2/logs/catalina.out <==
Nov 15, 2005 9:36:05 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
logSendMessage
INFO: SEND Nov 15, 2005:9:36:05 AM 4 -
BA25CFCE25F1E939EEF6F559D0A721B1.cluster1-1132043765344

==> cluster2/logs/localhost.2005-11-15.log <==
Nov 15, 2005 9:36:05 AM org.apache.catalina.core.ApplicationContext log
INFO: InvokerFilter(ApplicationFilterConfig[name=Path Mapped Filter,
filterClass=filters.ExampleFilter]): 13 milliseconds

<inactivity period>

==> cluster2/logs/catalina.out <==
Nov 15, 2005 9:36:59 AM
org.apache.catalina.cluster.tcp.TcpReplicationThread run
SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
Connection reset by peer' closing channel
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
        at
org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)

==> cluster1/logs/catalina.out <==
Nov 15, 2005 9:37:12 AM
org.apache.catalina.cluster.tcp.TcpReplicationThread run
SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
Connection reset by peer' closing channel
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
at
org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
at
org.apache.catalina.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)


Also, when shutting down the nodes I get some exceptions (same for both
nodes). This also happens if i turn off keppAliveTimeout, and set
waitForAck to be true (as suggested in first mail). 

cluster2/runner.sh stop
Using CATALINA_BASE:   /usr/local/cluster2
Using
CATALINA_HOME:   /home/olve/utvikling/svn-tomcat/tomcat-build/build/build
Using CATALINA_TMPDIR: /usr/local/cluster2/temp
Using JRE_HOME:       /usr/lib/j2sdk1.5-sun
olve@olve:/usr/local $
==> cluster2/logs/catalina.out <==
Nov 15, 2005 9:54:18 AM org.apache.coyote.http11.Http11BaseProtocol
pause
INFO: Pausing Coyote HTTP/1.1 on http-8082
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
Nov 15, 2005 9:54:19 AM org.apache.catalina.cluster.session.DeltaManager
stop
INFO: Manager [/servlets-examples] expiring sessions upon shutdown
Nov 15, 2005 9:54:19 AM
org.apache.catalina.cluster.session.JvmRouteBinderValve stop
INFO: JvmRouteBinderValve stopped
Nov 15, 2005 9:54:19 AM
org.apache.catalina.cluster.deploy.FarmWarDeployer stop
INFO: Cluster FarmWarDeployer stopped.
Nov 15, 2005 9:54:19 AM org.apache.catalina.cluster.util.FastQueue
remove
INFO: FastQueue: queue disabled, remove aborted
Nov 15, 2005 9:54:19 AM
org.apache.catalina.cluster.tcp.ReplicationTransmitter stop
INFO: Stopped ClusterSender at cluster
Catalina:type=Cluster,host=localhost with name
Catalina:type=ClusterSender,host=localhost
Nov 15, 2005 9:54:19 AM
org.apache.catalina.cluster.tcp.ReplicationListener listen
SEVERE: Unable to process request in ReplicationListener
java.nio.channels.ClosedSelectorException
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at
org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:130)
        at
org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394)
        at java.lang.Thread.run(Thread.java:595)
Nov 15, 2005 9:54:19 AM
org.apache.catalina.cluster.tcp.ClusterReceiverBase run
SEVERE: Unable to start cluster listener.
java.lang.NullPointerException
        at
org.apache.catalina.cluster.tcp.ReplicationListener.listen(ReplicationListener.java:182)
        at
org.apache.catalina.cluster.tcp.ClusterReceiverBase.run(ClusterReceiverBase.java:394)
        at java.lang.Thread.run(Thread.java:595)
Nov 15, 2005 9:54:19 AM org.apache.coyote.http11.Http11BaseProtocol
destroy
INFO: Stopping Coyote HTTP/1.1 on http-8082
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.AprLifecycleListener
lifecycleEvent
INFO: Failed shutdown of Apache Portable Runtime

==> cluster2/logs/localhost.2005-11-15.log <==
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: contextDestroyed()
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.ApplicationContext log
INFO: ContextListener: contextDestroyed()
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: contextDestroyed()
Nov 15, 2005 9:54:19 AM org.apache.catalina.core.ApplicationContext log
INFO: ContextListener: contextDestroyed()

==> cluster1/logs/catalina.out <==
Nov 15, 2005 9:54:23 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster
memberDisappeared
INFO: Received member
disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://127.0.0.1:4002,catalina,127.0.0.1,4002, alive=1269259]
Nov 15, 2005 9:54:23 AM org.apache.catalina.cluster.util.FastQueue
remove
INFO: FastQueue: queue disabled, remove aborted

Re: session replication errors

Posted by Olve Hansen <ol...@intermedia.uib.no>.

Thanks for pointing me in the right direction. 
What I ended up doing was 

svn checkout
http://svn.apache.org/repos/asf/tomcat/build/tc5.5.x/resources/
mkdir build-tomcat
cp resources/build.xml build-tomcat
cd build-tomcat
ant

Subversion complained that I couldn't check out a single file, so I
grabbed the whole resources folder instead. Trying to 'ant' from there
gave me another svn error as it (ant+svn) couldn't checkout the tomcat
sources in a folder already under svn control.

Thanks for your help Mark, appreciate it!

Olve

man, 14,.11.2005 kl. 19.01 +0000, skrev Mark Thomas:
> Olve Hansen wrote:
> > On trying to build I am encountering problems. There is no documentation
> > on how to build from svn. I have a working build folder using the docs
> > from http://tomcat.apache.org/tomcat-5.5-doc/building.html
> > But I dont see your changes in DataSender.java after CVS update.
> 
> The 5.5.12 docs still refer to CVS. The next release has already been 
> updated to refer to SVN. The 5.5.x docs on the website will be updated 
> once the next release has been made.
> 
> Meanwhile, the following should get you going:
> svn checkout 
> http://svn.apache.org/repos/asf/tomcat/build/tc5.5.x/resources/build.xml 
> /tomcat-build
> cd /tomcat-build
> ant
> 
> These commands might not be 100% correct (I don't have linux box to 
> check them on) but should give you the idea.
> 
> Mark

Re: session replication errors

Posted by Mark Thomas <ma...@apache.org>.

Olve Hansen wrote:
> On trying to build I am encountering problems. There is no documentation
> on how to build from svn. I have a working build folder using the docs
> from http://tomcat.apache.org/tomcat-5.5-doc/building.html
> But I dont see your changes in DataSender.java after CVS update.

The 5.5.12 docs still refer to CVS. The next release has already been 
updated to refer to SVN. The 5.5.x docs on the website will be updated 
once the next release has been made.

Meanwhile, the following should get you going:
svn checkout 
http://svn.apache.org/repos/asf/tomcat/build/tc5.5.x/resources/build.xml 
/tomcat-build
cd /tomcat-build
ant

These commands might not be 100% correct (I don't have linux box to 
check them on) but should give you the idea.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: session replication errors

Posted by Olve Hansen <ol...@intermedia.uib.no>.

lør, 12,.11.2005 kl. 13.43 +0100, skrev Peter Rossbach:
> Hey Olve,
> 
> I found a bug inside pushMessage and I hopefully fix it. Please checkout 
> the tomcat svn head,
> build a tomcat,  made a test and report the results
> 
> Many thanks that you report the problem,
> Peter

Hi, and thanks for trying to fix the problem. Dunno if it is fixed yet
for the following reason(s):

On trying to build I am encountering problems. There is no documentation
on how to build from svn. I have a working build folder using the docs
from http://tomcat.apache.org/tomcat-5.5-doc/building.html
But I dont see your changes in DataSender.java after CVS update.

If I checkout the SVN
http://svn.apache.org/repos/asf/tomcat/build/tc5.5.x/ I see your
changes, I am unable to build with the supplied build.xml

When trying the build.xml from SVN (above) I can see from the output
that a lot of properties is not set, but I am not sure where to set them
and what values to set them to. as all property file imports in the
build.xml files are commented out. No default/sample configuration is
given either.

It seems to me that the docs on how to build tomcat is rather out of
date. 
Hopefully someone can give me (and others in need) instructions on how
to build tomcat using the SVN repositories.

The build fails in the following way:

olve@olve:~/utvikling/svn-tomcat/tc5.5.x $ ant
Buildfile: build.xml

deploy-prepare:

deploy-static:

deploy:
     [echo] Target: Catalina - Deploy ...

flags:

flags.display:
     [echo] --- Build environment for Catalina ---
     [echo] If ${property_name} is displayed, then the property is not
set)
     [echo] --- Build options ---
     [echo] full.dist=${full.dist}
     [echo] build.sysclasspath=${build.sysclasspath}
     [echo] compile.debug=${compile.debug}
     [echo] compile.deprecation=${compile.deprecation}
     [echo] compile.optimize=${compile.optimize}
     [echo] --- Ant Flags ---
     [echo] <style> task available (required)=true
     [echo] --- JDK ---
     [echo] jdk.1.2.present=true
     [echo] jdk.1.3.present=true
     [echo] jdk.1.4.present=true
     [echo] --- Source Dependencies ---
     [echo] jtc.home.present=${jtc.home.present}
     [echo] --- Required Libraries ---
     [echo] beanutils.present=${beanutils.present}
     [echo] collections.present=${collections.present}
     [echo] digester.present=${digester.present}
     [echo] jaxp.present=true
     [echo] jndi.present=true
     [echo] logging.present=${logging.present}
     [echo] regexp.present=${regexp.present}
     [echo] --- Optional Libraries ---
     [echo] dbcp.present=${dbcp.present}
     [echo] fileupload.present=${fileupload.present}
     [echo] jaas.present=true
     [echo] javamail.present=${javamail.present}
     [echo] jmx.present=true
     [echo] jsse.present=true
     [echo] jta.present=${jta.present}
     [echo] junit.present=true
     [echo] lang.present=${lang.present}
     [echo] launcher.present=${launcher.present}
     [echo] launcher.bootstrap.present=${launcher.bootstrap.present}
     [echo] ldap.present=true
     [echo] modeler.present=${modeler.present}
     [echo] pool.present=${pool.present}
     [echo] --- Required JARs ---
     [echo] jndi.jar.present(except JDK 1.3+)=${jndi.jar.present}
     [echo] regexp.jar.present=${regexp.jar.present}
     [echo] servlet-api.jar.present=${servlet-api.jar.present}
     [echo] xerces2.jars.present(except JDK 1.4+)=
${xerces2.jars.present}
     [echo] --- Optional JARs ---
     [echo] dbcp.jar.present=${dbcp.jar.present}
     [echo] fileupload.jar.present=${fileupload.jar.present}
     [echo] jaas.jar.present=${jaas.jar.present}
     [echo] javamail.jar.present=${javamail.jar.present}
     [echo] jmx.jar.present=${jmx.jar.present}
     [echo] jta.jar.present=${jta.jar.present}
     [echo] junit.jar.present=${junit.jar.present}
     [echo] modeler.jar.present=${modeler.jar.present}
     [echo] pool.jar.present=${pool.jar.present}
     [echo] --- Conditional compilation flags ---
     [echo] compile.dbcp=${compile.dbcp}
     [echo] compile.jaas=true
     [echo] compile.javamail=${compile.javamail}
     [echo] compile.jmx=${compile.jmx}
     [echo] compile.jndi=true
     [echo] compile.jsse=true
     [echo] compile.jta=${compile.jta}
     [echo] compile.junit=true
     [echo] compile.ldap=true
     [echo] compile.ssi=true
     [echo] --- Distribution flags ---
     [echo] copy.dbcp.jar=${copy.dbcp.jar}
     [echo] copy.jmx.jar=${copy.jmx.jar}
     [echo] copy.launcher.jars=${copy.launcher.jars}
     [echo] copy.logging.jar=${copy.logging.jar}
     [echo] copy.modeler.jar=${copy.modeler.jar}
     [echo] copy.pool.jar=${copy.pool.jar}

build-prepare:

copy-fileupload.jar:

copy-launcher.jars:

copy-modeler.jar:

build-static:

BUILD FAILED
/home/olve/utvikling/svn-tomcat/tc5.5.x/build.xml:45: The following
error occurred while executing this line:
/home/olve/utvikling/svn-tomcat/tc5.5.x/catalina/build.xml:547: Warning:
Could not find
file /home/olve/utvikling/svn-tomcat/tc5.5.x/catalina/${commons-logging-api.jar} to copy.

Total time: 1 second
olve@olve:~/utvikling/svn-tomcat/tc5.5.x $

Re: session replication errors

Posted by Peter Rossbach <pr...@objektpark.de>.

Hey Olve,

I found a bug inside pushMessage and I hopefully fix it. Please checkout 
the tomcat svn head,
build a tomcat,  made a test and report the results

Many thanks that you report the problem,
Peter

Peter Rossbach schrieb:

> Hey Olve,
>
> I look more deeply inside the code and find that 
> o.a.c.cluster.tcp.DataSender.checkKeepAlive and sendMessage are both 
> synchronized messages.
> That means that we currently don't close the socket as we transfer a 
> message. But this not means that I found
> the bug! ...
>
> I also look at your config and see that you configure two 
> ClusterSessionListener, please remove one!
>
> <ClusterListener
> className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
>
> Thanks
> Peter
>
> Olve Hansen schrieb:
>
>> Hi,
>> First I must congratulate on the improvements of the clustering docs!
>> Now I don't have to guess nearly as much as before when trying to set up
>> clustering. :-)
>>
>> But I still have some troubles regarding clustering.
>> We have a webapp clustered over three tomcats, all on the same machine,
>> load balanced by mod_jk in an Apache frontend, using sticky lb, session
>> replication by fastasyncqueue.
>> It seems OK - no session dropouts in sticky mode, we can turn on and off
>> tomcats as we please.
>>
>> But, in the logs, we see numerous:
>> SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
>> Connection reset by peer' closing channel
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcher.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
>> at
>> o.a.c.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125) 
>>
>> at
>> o.a.c.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)
>>
>> I am trying to track these down, as they are marked as SEVERE. The
>> tomcats sits on a Red Hat Linux server. It is configured with two
>> networks, one open to the world, and firewalled, the other is internal,
>> and is fully open. The internal net is used for clustering, db-access,
>> and mod_jk connections.
>>
>> How severe are these? Can they be a result of two near concurrent
>> requests, and the second request cancels the serialisation of the first
>> session object? If not, where should I start to look?
>>
>> Log files and cluster setup follows:
>>
>>  
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: session replication errors

Posted by Peter Rossbach <pr...@objektpark.de>.

Hey Olve,

I look more deeply inside the code and find that 
o.a.c.cluster.tcp.DataSender.checkKeepAlive and sendMessage are both 
synchronized messages.
That means that we currently don't close the socket as we transfer a 
message. But this not means that I found
the bug! ...

I also look at your config and see that you configure two 
ClusterSessionListener, please remove one!

<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>

Thanks
Peter

Olve Hansen schrieb:

>Hi,
>First I must congratulate on the improvements of the clustering docs!
>Now I don't have to guess nearly as much as before when trying to set up
>clustering. :-)
>
>But I still have some troubles regarding clustering.
>We have a webapp clustered over three tomcats, all on the same machine,
>load balanced by mod_jk in an Apache frontend, using sticky lb, session
>replication by fastasyncqueue. 
>
>It seems OK - no session dropouts in sticky mode, we can turn on and off
>tomcats as we please.
>
>But, in the logs, we see numerous:
>SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
>Connection reset by peer' closing channel
>java.io.IOException: Connection reset by peer
>at sun.nio.ch.FileDispatcher.read0(Native Method)
>at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
>at
>o.a.c.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
>at
>o.a.c.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)
>
>I am trying to track these down, as they are marked as SEVERE. The
>tomcats sits on a Red Hat Linux server. It is configured with two
>networks, one open to the world, and firewalled, the other is internal,
>and is fully open. The internal net is used for clustering, db-access,
>and mod_jk connections.
>
>How severe are these? Can they be a result of two near concurrent
>requests, and the second request cancels the serialisation of the first
>session object? If not, where should I start to look?
>
>Log files and cluster setup follows:
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

Re: session replication errors

Posted by Peter Rossbach <pr...@objektpark.de>.

Hey,

I have seen the same problem at centos. I start now testing with suse 
9.3 ...
Can you please test with following config:

<Receiver
className="org.apache.catalina.cluster.tcp.ReplicationListener"
tcpListenAddress="192.168.5.112"
tcpListenPort="4003"
tcpSelectorTimeout="100"
tcpThreadCount="6"/>

<Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
replicationMode="fastasyncqueue"
keepAliveTimeout="-1"
waitForAck="true"
ackTimeout="15000"/>

I think that keepAliveTimeout handling can close socket in the middle that we transfer a messages. :-(
Default timeout is one minute. I start change the implementation that socket not dropped.


Thanks
Peter


Olve Hansen schrieb:

>Hi,
>First I must congratulate on the improvements of the clustering docs!
>Now I don't have to guess nearly as much as before when trying to set up
>clustering. :-)
>
>But I still have some troubles regarding clustering.
>We have a webapp clustered over three tomcats, all on the same machine,
>load balanced by mod_jk in an Apache frontend, using sticky lb, session
>replication by fastasyncqueue. 
>
>It seems OK - no session dropouts in sticky mode, we can turn on and off
>tomcats as we please.
>
>But, in the logs, we see numerous:
>SEVERE: TCP Worker thread in cluster caught 'java.io.IOException:
>Connection reset by peer' closing channel
>java.io.IOException: Connection reset by peer
>at sun.nio.ch.FileDispatcher.read0(Native Method)
>at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
>at
>o.a.c.cluster.tcp.TcpReplicationThread.drainChannel(TcpReplicationThread.java:125)
>at
>o.a.c.cluster.tcp.TcpReplicationThread.run(TcpReplicationThread.java:69)
>
>I am trying to track these down, as they are marked as SEVERE. The
>tomcats sits on a Red Hat Linux server. It is configured with two
>networks, one open to the world, and firewalled, the other is internal,
>and is fully open. The internal net is used for clustering, db-access,
>and mod_jk connections.
>
>How severe are these? Can they be a result of two near concurrent
>requests, and the second request cancels the serialisation of the first
>session object? If not, where should I start to look?
>
>Log files and cluster setup follows:
>
>cluster part of server.xml (only tcpListenPort differs among our
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org