You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Manak Bisht <ma...@iiitd.ac.in> on 2024/01/14 17:39:46 UTC

Tomcat not syncing existing sessions on restart

Hi,
I am using DeltaManager (static membership) with non-sticky load balancing
on two nodes. I have observed even load, and requests with the same
JSESSIONID being served successfully by both tomcats. This leads me to
conclude that session replication is working as expected when both nodes
are up.

However, when I restart any one of them, the newly restarted tomcat is
unable to serve requests from old sessions. The logs indicate that node
discovering is working but the session sync timeouts. New logins/sessions
work just fine though, implying that replication is working successfully
again.

*tomcat1.log*
13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1]
org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received
member
disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
payload={}, command={}, domain={}, ]
13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member
added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
payload={}, command={}, domain={}, ]
13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck
Suspect member, confirmed
alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
payload={}, command={}, domain={}, ]]
*13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4]
org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload
existing session XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX*


*tomcat2.log*
13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1]
org.apache.catalina.ha.session.DeltaManager.startInternal Register manager
localhost# to cluster element Engine with name Catalina
13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1]
org.apache.catalina.ha.session.DeltaManager.startInternal Starting
clustering manager at localhost#
13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1]
org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report
ThroughputInterceptor Report[
Tx Msg:1 messages
Sent:0.00 MB (total)
Sent:0.00 MB (application)
Time:0.06 seconds
Tx Speed:0.01 MB/sec (total)
TxSpeed:0.01 MB/sec (application)
Error Msg:0
Rx Msg:15 messages
Rx Speed:0.00 MB/sec (since 1st msg)
Received:0.00 MB]

13-Jan-2024 14:45:24.368 INFO [localhost-startStop-1]
org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager
[localhost#], requesting session state from
org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat1:8090,tomcat1,8090,
alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 },
payload={}, command={}, domain={}, ]. This operation will timeout if no
session state has been received within 60 seconds.
*13-Jan-2024 14:46:24.459 SEVERE [localhost-startStop-1]
org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager
[localhost#]: No session state send at 1/13/24 2:45 PM received, timing out
after 60,167 ms.*

There is also a warning, but I am unsure of its significance.
I have tried tweaking the sendAllSessions value to false and
increasing the stateTransferTimeout
window to no avail.

This is my clustering config for tomcat1 (the config is the same for
tomcat2 with the host as tomcat1 and uniqueId
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}) -

<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
    channelSendOptions="6" channelStartOptions="3">

    <Manager className="org.apache.catalina.ha.session.DeltaManager"/>

    <Channel className="org.apache.catalina.tribes.group.GroupChannel">
        <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
            address="0.0.0.0"
            port="8090"
            autoBind="0"/>

        <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
            <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
        </Sender>

        <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/>
        <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
        <Interceptor
className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
            <Member
className="org.apache.catalina.tribes.membership.StaticMember"
                port="8090"
                host="tomcat2"
                uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"/>
        </Interceptor>
        <Interceptor
className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
    </Channel>

    <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
filter=""/>

    <ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>

Any help would be greatly appreciated.

Sincerely,
Manak Bisht

Re: [OT] Tomcat not syncing existing sessions on restart

Posted by "Terence M. Bandoian" <te...@tmbsw.com>.
I would suggest focusing on Docker networking rather than Tomcat. My 
guess is that how that works will inform your Tomcat configuration. You 
might also try first getting it to work with two Docker instances on a 
single machine.

-Terence Bandoian

On 3/1/2024 11:59 AM, Manak Bisht wrote:
> I am fairly certain now that the docker container is the problem. I am
> unable to replicate the issue without it. Using the hostname/IP address of
> the host (tomcat/ip) for the receiver always causes the following problem,
> 01-Mar-2024 22:30:32.315 INFO [main]
> org.apache.catalina.tribes.transport.ReceiverBase.bind Unable to bind
> server socket to:tomcat/ip:4000 throwing error.
> 01-Mar-2024 22:30:32.315 SEVERE [main]
> org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start
> cluster receiver
>   java.net.BindException: Cannot assign requested address
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
> at
> org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184)
> at
> org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125)
> at
> org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89)
> at
> org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150)
> at
> org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102)
> at
> org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
> at
> org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
> at
> org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108)
> at
> org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
> at
> org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
> at
> org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65)
> at
> org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
> at
> org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421)
> at
> org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at
> org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902)
> at
> org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at
> org.apache.catalina.core.StandardService.startInternal(StandardService.java:439)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at
> org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at org.apache.catalina.startup.Catalina.start(Catalina.java:625)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)
>
> Either address binding does not work for any address inside the container
> or just binding to the address of the host machine does not work. I am
> leaning towards the latter because the *<Member> *element has never
> exhibited this issue. Here's what I have already tried/checked,
>
>     - The receiver/address port of the container is mapped to the same port
>     on the host
>     - The IP of the host is reachable via ping and telnet from the container.
>     - Running the following code from inside the container always works
>     java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat");
>     System.out.println(bind); // Output: tomcat/ip
>
> I have read a lot of resources and tried a variety of solutions to no
> avail. Literature covering session replication with containerisation is
> also sparse. If someone has tried this before or has any ideas, please let
> me know, I would greatly appreciate it.
>
> Sincerely,
> Manak Bisht
>
>
> On Mon, Feb 12, 2024 at 9:07 PM Christopher Schultz <
> chris@christopherschultz.net> wrote:
>
>> Manak,
>>
>> On 2/12/24 10:33, Manak Bisht wrote:
>>> Chris,
>>>
>>> On Mon, 12 Feb 2024, 20:52 Christopher Schultz, <
>>> chris@christopherschultz.net> wrote:
>>>
>>>> I wouldn't refuse to configure, since anyone using
>>>> 0.0.0.0 with /separate/ hosts wouldn't experience this problem.
>>>
>>> I am using separate hosts (two docker containers on two different
>> machines)
>>> in my main deployment. I just reproduced the problem on the same host to
>>> rule out network issues.
>> Thanks for the clarification. For some reason, I thought this was two
>> Docker containers on the same host.
>>
>> -chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail:users-help@tomcat.apache.org
>>
>>


Re: [OT] Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
I am fairly certain now that the docker container is the problem. I am
unable to replicate the issue without it. Using the hostname/IP address of
the host (tomcat/ip) for the receiver always causes the following problem,
01-Mar-2024 22:30:32.315 INFO [main]
org.apache.catalina.tribes.transport.ReceiverBase.bind Unable to bind
server socket to:tomcat/ip:4000 throwing error.
01-Mar-2024 22:30:32.315 SEVERE [main]
org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start
cluster receiver
 java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at
org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184)
at
org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125)
at
org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89)
at
org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150)
at
org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421)
at
org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902)
at
org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.StandardService.startInternal(StandardService.java:439)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.startup.Catalina.start(Catalina.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)

Either address binding does not work for any address inside the container
or just binding to the address of the host machine does not work. I am
leaning towards the latter because the *<Member> *element has never
exhibited this issue. Here's what I have already tried/checked,

   - The receiver/address port of the container is mapped to the same port
   on the host
   - The IP of the host is reachable via ping and telnet from the container.
   - Running the following code from inside the container always works
   java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat");
   System.out.println(bind); // Output: tomcat/ip

I have read a lot of resources and tried a variety of solutions to no
avail. Literature covering session replication with containerisation is
also sparse. If someone has tried this before or has any ideas, please let
me know, I would greatly appreciate it.

Sincerely,
Manak Bisht


On Mon, Feb 12, 2024 at 9:07 PM Christopher Schultz <
chris@christopherschultz.net> wrote:

> Manak,
>
> On 2/12/24 10:33, Manak Bisht wrote:
> > Chris,
> >
> > On Mon, 12 Feb 2024, 20:52 Christopher Schultz, <
> > chris@christopherschultz.net> wrote:
> >
> >> I wouldn't refuse to configure, since anyone using
> >> 0.0.0.0 with /separate/ hosts wouldn't experience this problem.
> >
> >
> > I am using separate hosts (two docker containers on two different
> machines)
> > in my main deployment. I just reproduced the problem on the same host to
> > rule out network issues.
>
> Thanks for the clarification. For some reason, I thought this was two
> Docker containers on the same host.
>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: [OT] Tomcat not syncing existing sessions on restart

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Manak,

On 2/12/24 10:33, Manak Bisht wrote:
> Chris,
> 
> On Mon, 12 Feb 2024, 20:52 Christopher Schultz, <
> chris@christopherschultz.net> wrote:
> 
>> I wouldn't refuse to configure, since anyone using
>> 0.0.0.0 with /separate/ hosts wouldn't experience this problem.
> 
> 
> I am using separate hosts (two docker containers on two different machines)
> in my main deployment. I just reproduced the problem on the same host to
> rule out network issues.

Thanks for the clarification. For some reason, I thought this was two 
Docker containers on the same host.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: [OT] Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
Chris,

On Mon, 12 Feb 2024, 20:52 Christopher Schultz, <
chris@christopherschultz.net> wrote:

> I wouldn't refuse to configure, since anyone using
> 0.0.0.0 with /separate/ hosts wouldn't experience this problem.


I am using separate hosts (two docker containers on two different machines)
in my main deployment. I just reproduced the problem on the same host to
rule out network issues.

Sincerely,
Manak Bisht

Re: [OT] Tomcat not syncing existing sessions on restart

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Mark,

On 2/9/24 06:14, Mark Thomas wrote:
> With the Receiver using address="0.0.0.0" I see the same issues you do. 
> I'm not yet convinced that is a bug.

If this is known to essentially always not-work... should we log 
something at startup? I wouldn't refuse to configure, since anyone using 
0.0.0.0 with /separate/ hosts wouldn't experience this problem.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Mark Thomas <ma...@apache.org>.
On 10/03/2024 16:59, Manak Bisht wrote:
> On Fri, Feb 9, 2024 at 4:45 PM Mark Thomas <ma...@apache.org> wrote:
> 
>> Using 0.0.0.0 as the address for the receiver is going to cause
>> problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too
>> deeply into things as a) I am short of time and b) I'm not convinced
>> this should/could work anyway.
>>
>> What seems to happen is that the use of 0.0.0.0 confuses the cluster as
>> to which node is which - I think because multiple nodes are using
>> 0.0.0.0. That causes the failure of the initial state synchronisation.
>>
> 
> Yes, this was indeed the problem. I chose 0.0.0.0 because binding to the
> host's ip threw the following error -
> 
>> 01-Mar-2024 22:30:32.315 SEVERE [main]
>> org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start
>> cluster receiver
>>   java.net.BindException: Cannot assign requested address
> 
> The full stack trace is available in my previous mail.
> 
> To identify the problem, I ran my application outside the container, where
> I did not encounter the above error. This led me to investigate on the
> Docker side of things. By default, a Docker container uses a bridge
> network, so binding to the host's ip address from inside the container is
> simply not possible even when the receiver port has been correctly mapped.
> I was able to get it to work by passing the --network=host flag to my
> docker create command. This puts the container inside the host's network,
> essentially de-containerizing its networking.
> Although this works, this is not desirable because this opens every port on
> the container, increasing the surface area for security and debugging.
> 0.0.0.0 is a natural choice and is used by a lot of applications running on
> Docker, even the official Tomcat image on Docker Hub does so.

There is no official Docker image provided by the Tomcat project.

> I am no expert on Docker or Tomcat, however, I don't think this is ideal.
> Docker has become so ubiquitous that I couldn't imagine deploying without
> it, but using clustering makes me lose some of the benefits of it. I have
> not looked into it, but this might also impact the BackupManager because it
> also requires a Receiver element.
> 
> On Mon, Feb 12, 2024 at 8:52 PM Christopher Schultz <
> chris@christopherschultz.net> wrote:
> 
>> If this is known to essentially always not-work... should we log
>> something at startup?
> 
> I think this is the least that we could do, I am willing to work on this.
> However, I also think that this should be looked into deeper to solve the
> actual problem.

Thinking about this a little more (although I am still short on time so 
haven't investigated) I wonder if the issue is that a node needs to 
advertise to other nodes what IP address it is listening on. If if 
advertises 0.0.0.0 the other nodes have no way to contact it. Further 
(and you can look at the acceptor unlock code for the details) trying to 
determine a valid IP address to provide to other nodes is non-trivial 
(and the acceptor case is only looking at localhost, not across a network).

> I understand that this discussion might be more fit for the dev mailing
> list, please let me know if you think the above holds merit, and I will
> move it there.

You start to get into having to separate the IP address a node listens 
on and the IP address it advertises for other nodes to contact it 
(similar to HTTP or JMX behind a proxy)

I'm not a docker expert but it looks to me from a quick Google search 
that the expectation in this case is that you should use swarm mode 
which provides an overlay network across the nodes.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
On Fri, Feb 9, 2024 at 4:45 PM Mark Thomas <ma...@apache.org> wrote:

> Using 0.0.0.0 as the address for the receiver is going to cause
> problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too
> deeply into things as a) I am short of time and b) I'm not convinced
> this should/could work anyway.
>
> What seems to happen is that the use of 0.0.0.0 confuses the cluster as
> to which node is which - I think because multiple nodes are using
> 0.0.0.0. That causes the failure of the initial state synchronisation.
>

Yes, this was indeed the problem. I chose 0.0.0.0 because binding to the
host's ip threw the following error -

> 01-Mar-2024 22:30:32.315 SEVERE [main]
> org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start
> cluster receiver
>  java.net.BindException: Cannot assign requested address

The full stack trace is available in my previous mail.

To identify the problem, I ran my application outside the container, where
I did not encounter the above error. This led me to investigate on the
Docker side of things. By default, a Docker container uses a bridge
network, so binding to the host's ip address from inside the container is
simply not possible even when the receiver port has been correctly mapped.
I was able to get it to work by passing the --network=host flag to my
docker create command. This puts the container inside the host's network,
essentially de-containerizing its networking.
Although this works, this is not desirable because this opens every port on
the container, increasing the surface area for security and debugging.
0.0.0.0 is a natural choice and is used by a lot of applications running on
Docker, even the official Tomcat image on Docker Hub does so.
I am no expert on Docker or Tomcat, however, I don't think this is ideal.
Docker has become so ubiquitous that I couldn't imagine deploying without
it, but using clustering makes me lose some of the benefits of it. I have
not looked into it, but this might also impact the BackupManager because it
also requires a Receiver element.

On Mon, Feb 12, 2024 at 8:52 PM Christopher Schultz <
chris@christopherschultz.net> wrote:

> If this is known to essentially always not-work... should we log
> something at startup?

I think this is the least that we could do, I am willing to work on this.
However, I also think that this should be looked into deeper to solve the
actual problem.

I understand that this discussion might be more fit for the dev mailing
list, please let me know if you think the above holds merit, and I will
move it there.

Sincerely,
Manak Bisht

Re: Tomcat not syncing existing sessions on restart

Posted by Mark Thomas <ma...@apache.org>.
On 09/02/2024 07:51, Manak Bisht wrote:
> On Fri, Feb 9, 2024 at 3:25 AM Mark Thomas <ma...@apache.org> wrote:
> 
>> Same JRE?
>>
>   Yes, 8.0.402
> 
> Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm
>> not sure how the clustering would behave with 0.0.0.0

Using 0.0.0.0 as the address for the receiver is going to cause 
problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too 
deeply into things as a) I am short of time and b) I'm not convinced 
this should/could work anyway.

What seems to happen is that the use of 0.0.0.0 confuses the cluster as 
to which node is which - I think because multiple nodes are using 
0.0.0.0. That causes the failure of the initial state synchronisation.

> That's the problem really. Using the DNS name or IP address causes the
> following error -

I am as sure as I can be that the issue you are seeing is environmental. 
I have configured my test cluster with:

- your cluster configuration with changes to host names and IP addresses
- Java 8.0.402
- Tomcat 8.5.x

With the Receiver using address="0.0.0.0" I see the same issues you do. 
I'm not yet convinced that is a bug.

With the Receiver using address="hostname" the cluster starts but 
doesn't work. Examining the logs shows that is because the host name 
resolves to a loopback address. I'd class that as behaving as expected. 
I coudl always change the host's config if I wanted the name to resolve 
to the public IP.

With the Receiver using address="ip-address" the cluster start and log 
messages show that cluster state is exchanged within a few milliseconds.

That leads me to conclude that the BindException you see is a 
configuration and/or envornmental issue although I don't see why your 
simple test works but clustering doesn't. Perhaps a conflict with 
something else in your Tomcat configuration?

Somethign to try is starting Tomcat with the Receiver using 0.0.0.0 and 
then using nestat to see which address/port combinations are being used.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
On Fri, Feb 9, 2024 at 3:25 AM Mark Thomas <ma...@apache.org> wrote:

> Same JRE?
>
 Yes, 8.0.402

Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm
> not sure how the clustering would behave with 0.0.0.0
>
That's the problem really. Using the DNS name or IP address causes the
following error -

09-Feb-2024 13:08:32.440 SEVERE [main]
org.apache.catalina.startup.Catalina.start The required Server component
failed to start so Tomcat is unable to start.
 org.apache.catalina.LifecycleException: Failed to start component
[StandardServer[8006]]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
at org.apache.catalina.startup.Catalina.start(Catalina.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)
Caused by: org.apache.catalina.LifecycleException: Failed to start
component [StandardService[Catalina]]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
at
org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
... 7 more
Caused by: org.apache.catalina.LifecycleException: Failed to start
component [StandardEngine[Catalina]]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
at
org.apache.catalina.core.StandardService.startInternal(StandardService.java:439)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
... 9 more
Caused by: org.apache.catalina.LifecycleException: Failed to start
component [org.apache.catalina.ha.tcp.SimpleTcpCluster[Catalina]]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902)
at
org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
... 11 more
Caused by: org.apache.catalina.LifecycleException:
org.apache.catalina.tribes.ChannelException: java.net.BindException: Cannot
assign requested address; No faulty members identified.
at
org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:549)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
... 14 more
Caused by: org.apache.catalina.tribes.ChannelException:
java.net.BindException: Cannot assign requested address; No faulty members
identified.
at
org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:184)
at
org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65)
at
org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155)
at
org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421)
at
org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544)
... 15 more
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at
org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184)
at
org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125)
at
org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89)
at
org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150)
... 25 more

The *host *attribute of the *member* element does not exhibit the same
problem.
The DNS/IP is also reachable via ping and telnet.
I even wrote a simple test to check this, and it works successfully -
java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat");
System.out.println(bind); // Output: tomcat/ip

Sincerely,
Manak Bisht

Re: Tomcat not syncing existing sessions on restart

Posted by Mark Thomas <ma...@apache.org>.
On 07/02/2024 11:43, Manak Bisht wrote:
> I think I have narrowed down the problem. For Tomcat 9 (v9.0.85), using
> 0.0.0.0 for the local member and receiver works fine. However, the same
> does not work in Tomcat 8.5 (v8.5.98).

Same JRE?

Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm 
not sure how the clustering would behave with 0.0.0.0

Mark


> 
> Sincerely,
> Manak Bisht
> 
> 
> On Fri, Feb 2, 2024 at 9:41 PM Mark Thomas <ma...@apache.org> wrote:
> 
>> On 31/01/2024 13:33, Manak Bisht wrote:
>>> I tried tweaking all the settings that I could think of but I am unable
>> to
>>> sync sessions on restart even on a stock Tomcat 8.5.98 installation using
>>> your provided war. I am unable to identify whether this is actually a bug
>>> or something wrong with my configuration (this is far more likely). Could
>>> you please share your server.xml? Did you make any other changes?
>>>
>>> Sincerely,
>>> Manak Bisht
>>
>> Here is the cluster configuration from the first node my test environment:
>>
>> <Cluster
>>       className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
>>       channelSendOptions="6"
>>       >
>>
>>     <Manager
>>         className="org.apache.catalina.ha.session.DeltaManager"
>>         expireSessionsOnShutdown="false"
>>         notifyListenersOnReplication="true"
>>         />
>>
>>     <Channel
>>         className="org.apache.catalina.tribes.group.GroupChannel">
>>
>>       <Membership
>>
>> className="org.apache.catalina.tribes.membership.StaticMembershipService"
>>           >
>>
>>         <Member
>>             className="org.apache.catalina.tribes.membership.StaticMember"
>>             port="4000"
>>             host="192.168.23.32"
>>             uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}"
>>             />
>>
>>         <Member
>>             className="org.apache.catalina.tribes.membership.StaticMember"
>>             port="4000"
>>             host="192.168.23.33"
>>             uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"
>>             />
>>       </Membership>
>>
>>       <Receiver
>>           className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>>           address="192.168.23.32"
>>           port="4000"
>>           autoBind="0"
>>           selectorTimeout="5000"
>>           maxThreads="6"
>>           />
>>
>>       <Sender
>>
>> className="org.apache.catalina.tribes.transport.ReplicationTransmitter"
>>           >
>>
>>         <Transport
>>
>> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
>>             />
>>
>>       </Sender>
>>
>>       <Interceptor
>>
>>
>> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"
>>           performReadTest="true"
>>           />
>>       <Interceptor
>>
>>
>> className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"
>>           />
>>     </Channel>
>>
>>     <Deployer
>>         className="org.apache.catalina.ha.deploy.FarmWarDeployer"
>>         tempDir="cluster-temp"
>>         deployDir="webapps"
>>         watchDir="cluster-watch"
>>         watchEnabled="true"
>>         />
>>
>>     <Valve
>>         className="org.apache.catalina.ha.tcp.ReplicationValve"
>>         filter=""
>>         />
>>
>>     <Valve
>>         className="org.apache.catalina.ha.session.JvmRouteBinderValve"
>>         />
>>
>>     <ClusterListener
>>        className="org.apache.catalina.ha.session.ClusterSessionListener"
>>          />
>>
>> </Cluster>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
I think I have narrowed down the problem. For Tomcat 9 (v9.0.85), using
0.0.0.0 for the local member and receiver works fine. However, the same
does not work in Tomcat 8.5 (v8.5.98).

Sincerely,
Manak Bisht


On Fri, Feb 2, 2024 at 9:41 PM Mark Thomas <ma...@apache.org> wrote:

> On 31/01/2024 13:33, Manak Bisht wrote:
> > I tried tweaking all the settings that I could think of but I am unable
> to
> > sync sessions on restart even on a stock Tomcat 8.5.98 installation using
> > your provided war. I am unable to identify whether this is actually a bug
> > or something wrong with my configuration (this is far more likely). Could
> > you please share your server.xml? Did you make any other changes?
> >
> > Sincerely,
> > Manak Bisht
>
> Here is the cluster configuration from the first node my test environment:
>
> <Cluster
>      className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
>      channelSendOptions="6"
>      >
>
>    <Manager
>        className="org.apache.catalina.ha.session.DeltaManager"
>        expireSessionsOnShutdown="false"
>        notifyListenersOnReplication="true"
>        />
>
>    <Channel
>        className="org.apache.catalina.tribes.group.GroupChannel">
>
>      <Membership
>
> className="org.apache.catalina.tribes.membership.StaticMembershipService"
>          >
>
>        <Member
>            className="org.apache.catalina.tribes.membership.StaticMember"
>            port="4000"
>            host="192.168.23.32"
>            uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}"
>            />
>
>        <Member
>            className="org.apache.catalina.tribes.membership.StaticMember"
>            port="4000"
>            host="192.168.23.33"
>            uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"
>            />
>      </Membership>
>
>      <Receiver
>          className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>          address="192.168.23.32"
>          port="4000"
>          autoBind="0"
>          selectorTimeout="5000"
>          maxThreads="6"
>          />
>
>      <Sender
>
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter"
>          >
>
>        <Transport
>
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
>            />
>
>      </Sender>
>
>      <Interceptor
>
>
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"
>          performReadTest="true"
>          />
>      <Interceptor
>
>
> className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"
>          />
>    </Channel>
>
>    <Deployer
>        className="org.apache.catalina.ha.deploy.FarmWarDeployer"
>        tempDir="cluster-temp"
>        deployDir="webapps"
>        watchDir="cluster-watch"
>        watchEnabled="true"
>        />
>
>    <Valve
>        className="org.apache.catalina.ha.tcp.ReplicationValve"
>        filter=""
>        />
>
>    <Valve
>        className="org.apache.catalina.ha.session.JvmRouteBinderValve"
>        />
>
>    <ClusterListener
>       className="org.apache.catalina.ha.session.ClusterSessionListener"
>         />
>
> </Cluster>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat not syncing existing sessions on restart

Posted by Mark Thomas <ma...@apache.org>.
On 31/01/2024 13:33, Manak Bisht wrote:
> I tried tweaking all the settings that I could think of but I am unable to
> sync sessions on restart even on a stock Tomcat 8.5.98 installation using
> your provided war. I am unable to identify whether this is actually a bug
> or something wrong with my configuration (this is far more likely). Could
> you please share your server.xml? Did you make any other changes?
> 
> Sincerely,
> Manak Bisht

Here is the cluster configuration from the first node my test environment:

<Cluster
     className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
     channelSendOptions="6"
     >

   <Manager
       className="org.apache.catalina.ha.session.DeltaManager"
       expireSessionsOnShutdown="false"
       notifyListenersOnReplication="true"
       />

   <Channel
       className="org.apache.catalina.tribes.group.GroupChannel">

     <Membership
 
className="org.apache.catalina.tribes.membership.StaticMembershipService"
         >

       <Member
           className="org.apache.catalina.tribes.membership.StaticMember"
           port="4000"
           host="192.168.23.32"
           uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}"
           />

       <Member
           className="org.apache.catalina.tribes.membership.StaticMember"
           port="4000"
           host="192.168.23.33"
           uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"
           />
     </Membership>

     <Receiver
         className="org.apache.catalina.tribes.transport.nio.NioReceiver"
         address="192.168.23.32"
         port="4000"
         autoBind="0"
         selectorTimeout="5000"
         maxThreads="6"
         />

     <Sender
 
className="org.apache.catalina.tribes.transport.ReplicationTransmitter"
         >

       <Transport
 
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
           />

     </Sender>

     <Interceptor
 
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"
         performReadTest="true"
         />
     <Interceptor
 
className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"
         />
   </Channel>

   <Deployer
       className="org.apache.catalina.ha.deploy.FarmWarDeployer"
       tempDir="cluster-temp"
       deployDir="webapps"
       watchDir="cluster-watch"
       watchEnabled="true"
       />

   <Valve
       className="org.apache.catalina.ha.tcp.ReplicationValve"
       filter=""
       />

   <Valve
       className="org.apache.catalina.ha.session.JvmRouteBinderValve"
       />

   <ClusterListener
      className="org.apache.catalina.ha.session.ClusterSessionListener"
        />

</Cluster>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
I tried tweaking all the settings that I could think of but I am unable to
sync sessions on restart even on a stock Tomcat 8.5.98 installation using
your provided war. I am unable to identify whether this is actually a bug
or something wrong with my configuration (this is far more likely). Could
you please share your server.xml? Did you make any other changes?

Sincerely,
Manak Bisht

>

Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
Hi Mark,
I tried running your *cluster-test* war example on a stock 8.5.98
installation, however, I am facing the same issue. Session sync does not
trigger on restarting a node. Could you please share your configuration?

Sincerely,
Manak Bisht

Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
Thanks for going the extra mile to help me out on this. I really appreciate
it.
As far as I am aware, the auto detection of local member is only available
post v9.0.17 and the <LocalMember> tag was added in v8.5.1. Unfortunately,
I happen to be working in an environment where 8.5.0 is the highest non-EOL
version available. I know I am playing very fast and loose with the
definition of EOL when the current version is 8.5.98. Since the
StaticMembershipInterceptor has been available for a long time, I thought I
could make it work without those two features.

Sincerely,
Manak Bisht

On Tue, Jan 23, 2024 at 3:56 PM Mark Thomas <ma...@apache.org> wrote:

> The other difference is that you don't appear to have defined the local
> member of the cluster. You should define all members of the cluster,
> including the local member, on each node. The local member can be
> defined explicitly as LocalMember or as an ordinary Member and Tomcat
> will figure out it is the local one.
>

Re: Tomcat not syncing existing sessions on restart

Posted by Mark Thomas <ma...@apache.org>.
I have configured my standard cluster test environment for a 2-node 
cluster, using DeltaManager and static membership. httpd is configured 
for non-sticky load-balancing.

Each node has the Manager web application and my simple cluster-test 
deployed.
https://people.apache.org/~markt/dev/cluster-test.war

Starting both both nodes and connecting directly to each manager 
instance shows no sessions in cluster-test as expected.

Requesting the cluster index page via httpd triggers the creation of a 
single session in cluster-test. Requests alternate between node 1 and 
node 2 as expected. Examining the session via the manager app shows that 
the changes to the session are being correctly replicated.

Stopping node 2 causes further requests to be directed to node 1 only.

Starting node 2 shows that the session is replicated correctly from node 
1. I see the updated session in both nodes via the Manager app.

Also the following test works:
- create a session
- stop node 2
- further requests (handled by node 1)
- stop requests
- start node 2
- stop node 1
- resume requests (handled by node 2)

One difference is that I am using the StaticMembershipService rather 
than the StaticMembershipInterceptor. I don't think that will make any 
difference.

The other difference is that you don't appear to have defined the local 
member of the cluster. You should define all members of the cluster, 
including the local member, on each node. The local member can be 
defined explicitly as LocalMember or as an ordinary Member and Tomcat 
will figure out it is the local one.

Mark


On 22/01/2024 08:39, Manak Bisht wrote:
> I thought that this https://marc.info/?l=tomcat-user&m=119376798217922&w=2
> might be the problem.
> *"The uniqueId is used to be able to differentiate between the same node
>   joining a cluster, then crashing and then rejoining again. if the uniqueId
> didn't change in between this, there is no way to tell  the difference
> between a node going down, or just leaving the cluster  and rejoining."*
> So, I tried creating a session when one of the nodes was down, but that did
> not sync as well when the other node came online again.
> In that case, I would also expect org.apache.catalina.ha.
> session.DeltaManager.waitForSendAllSessions to proceed with no state sync
> rather than timing out.
> 
> I have also checked the time on both the servers using the Linux date
> command and they seem to be in sync. The timezone flag passed to the
> JAVA_OPTS argument in catalina.sh is also the same. Please let me know if
> any more information is required to help debug this issue.
> 
> Sincerely,
> Manak Bisht
> 
> On Sun, Jan 14, 2024 at 11:09 PM Manak Bisht <ma...@iiitd.ac.in> wrote:
> 
>> Hi,
>> I am using DeltaManager (static membership) with non-sticky load balancing
>> on two nodes. I have observed even load, and requests with the same
>> JSESSIONID being served successfully by both tomcats. This leads me to
>> conclude that session replication is working as expected when both nodes
>> are up.
>>
>> However, when I restart any one of them, the newly restarted tomcat is
>> unable to serve requests from old sessions. The logs indicate that node
>> discovering is working but the session sync timeouts. New logins/sessions
>> work just fine though, implying that replication is working successfully
>> again.
>>
>> *tomcat1.log*
>> 13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1]
>> org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received
>> member
>> disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
>> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
>> payload={}, command={}, domain={}, ]
>> 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
>> org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member
>> added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
>> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
>> payload={}, command={}, domain={}, ]
>> 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck
>> Suspect member, confirmed
>> alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
>> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
>> payload={}, command={}, domain={}, ]]
>> *13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4]
>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload
>> existing session XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX*
>>
>>
>> *tomcat2.log*
>> 13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1]
>> org.apache.catalina.ha.session.DeltaManager.startInternal Register manager
>> localhost# to cluster element Engine with name Catalina
>> 13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1]
>> org.apache.catalina.ha.session.DeltaManager.startInternal Starting
>> clustering manager at localhost#
>> 13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1]
>> org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report
>> ThroughputInterceptor Report[
>> Tx Msg:1 messages
>> Sent:0.00 MB (total)
>> Sent:0.00 MB (application)
>> Time:0.06 seconds
>> Tx Speed:0.01 MB/sec (total)
>> TxSpeed:0.01 MB/sec (application)
>> Error Msg:0
>> Rx Msg:15 messages
>> Rx Speed:0.00 MB/sec (since 1st msg)
>> Received:0.00 MB]
>>
>> 13-Jan-2024 14:45:24.368 INFO [localhost-startStop-1]
>> org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager
>> [localhost#], requesting session state from
>> org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat1:8090,tomcat1,8090,
>> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 },
>> payload={}, command={}, domain={}, ]. This operation will timeout if no
>> session state has been received within 60 seconds.
>> *13-Jan-2024 14:46:24.459 SEVERE [localhost-startStop-1]
>> org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager
>> [localhost#]: No session state send at 1/13/24 2:45 PM received, timing out
>> after 60,167 ms.*
>>
>> There is also a warning, but I am unsure of its significance.
>> I have tried tweaking the sendAllSessions value to false and increasing
>> the stateTransferTimeout window to no avail.
>>
>> This is my clustering config for tomcat1 (the config is the same for
>> tomcat2 with the host as tomcat1 and uniqueId
>> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}) -
>>
>> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
>>      channelSendOptions="6" channelStartOptions="3">
>>
>>      <Manager className="org.apache.catalina.ha.session.DeltaManager"/>
>>
>>      <Channel className="org.apache.catalina.tribes.group.GroupChannel">
>>          <Receiver
>> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>>              address="0.0.0.0"
>>              port="8090"
>>              autoBind="0"/>
>>
>>          <Sender
>> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
>>              <Transport
>> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
>>          </Sender>
>>
>>          <Interceptor
>> className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/>
>>          <Interceptor
>> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
>>          <Interceptor
>> className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
>>              <Member
>> className="org.apache.catalina.tribes.membership.StaticMember"
>>                  port="8090"
>>                  host="tomcat2"
>>                  uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"/>
>>          </Interceptor>
>>          <Interceptor
>> className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
>>      </Channel>
>>
>>      <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
>> filter=""/>
>>
>>      <ClusterListener
>> className="org.apache.catalina.ha.session.ClusterSessionListener"/>
>> </Cluster>
>>
>> Any help would be greatly appreciated.
>>
>> Sincerely,
>> Manak Bisht
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat not syncing existing sessions on restart

Posted by Manak Bisht <ma...@iiitd.ac.in>.
I thought that this https://marc.info/?l=tomcat-user&m=119376798217922&w=2
might be the problem.
*"The uniqueId is used to be able to differentiate between the same node
 joining a cluster, then crashing and then rejoining again. if the uniqueId
didn't change in between this, there is no way to tell  the difference
between a node going down, or just leaving the cluster  and rejoining."*
So, I tried creating a session when one of the nodes was down, but that did
not sync as well when the other node came online again.
In that case, I would also expect org.apache.catalina.ha.
session.DeltaManager.waitForSendAllSessions to proceed with no state sync
rather than timing out.

I have also checked the time on both the servers using the Linux date
command and they seem to be in sync. The timezone flag passed to the
JAVA_OPTS argument in catalina.sh is also the same. Please let me know if
any more information is required to help debug this issue.

Sincerely,
Manak Bisht

On Sun, Jan 14, 2024 at 11:09 PM Manak Bisht <ma...@iiitd.ac.in> wrote:

> Hi,
> I am using DeltaManager (static membership) with non-sticky load balancing
> on two nodes. I have observed even load, and requests with the same
> JSESSIONID being served successfully by both tomcats. This leads me to
> conclude that session replication is working as expected when both nodes
> are up.
>
> However, when I restart any one of them, the newly restarted tomcat is
> unable to serve requests from old sessions. The logs indicate that node
> discovering is working but the session sync timeouts. New logins/sessions
> work just fine though, implying that replication is working successfully
> again.
>
> *tomcat1.log*
> 13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1]
> org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received
> member
> disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
> payload={}, command={}, domain={}, ]
> 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
> org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member
> added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
> payload={}, command={}, domain={}, ]
> 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1]
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck
> Suspect member, confirmed
> alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090,
> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 },
> payload={}, command={}, domain={}, ]]
> *13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4]
> org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload
> existing session XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX*
>
>
> *tomcat2.log*
> 13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1]
> org.apache.catalina.ha.session.DeltaManager.startInternal Register manager
> localhost# to cluster element Engine with name Catalina
> 13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1]
> org.apache.catalina.ha.session.DeltaManager.startInternal Starting
> clustering manager at localhost#
> 13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1]
> org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report
> ThroughputInterceptor Report[
> Tx Msg:1 messages
> Sent:0.00 MB (total)
> Sent:0.00 MB (application)
> Time:0.06 seconds
> Tx Speed:0.01 MB/sec (total)
> TxSpeed:0.01 MB/sec (application)
> Error Msg:0
> Rx Msg:15 messages
> Rx Speed:0.00 MB/sec (since 1st msg)
> Received:0.00 MB]
>
> 13-Jan-2024 14:45:24.368 INFO [localhost-startStop-1]
> org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager
> [localhost#], requesting session state from
> org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat1:8090,tomcat1,8090,
> alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 },
> payload={}, command={}, domain={}, ]. This operation will timeout if no
> session state has been received within 60 seconds.
> *13-Jan-2024 14:46:24.459 SEVERE [localhost-startStop-1]
> org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager
> [localhost#]: No session state send at 1/13/24 2:45 PM received, timing out
> after 60,167 ms.*
>
> There is also a warning, but I am unsure of its significance.
> I have tried tweaking the sendAllSessions value to false and increasing
> the stateTransferTimeout window to no avail.
>
> This is my clustering config for tomcat1 (the config is the same for
> tomcat2 with the host as tomcat1 and uniqueId
> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}) -
>
> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
>     channelSendOptions="6" channelStartOptions="3">
>
>     <Manager className="org.apache.catalina.ha.session.DeltaManager"/>
>
>     <Channel className="org.apache.catalina.tribes.group.GroupChannel">
>         <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>             address="0.0.0.0"
>             port="8090"
>             autoBind="0"/>
>
>         <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
>             <Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
>         </Sender>
>
>         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"/>
>         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
>         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
>             <Member
> className="org.apache.catalina.tribes.membership.StaticMember"
>                 port="8090"
>                 host="tomcat2"
>                 uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}"/>
>         </Interceptor>
>         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
>     </Channel>
>
>     <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
> filter=""/>
>
>     <ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener"/>
> </Cluster>
>
> Any help would be greatly appreciated.
>
> Sincerely,
> Manak Bisht
>