You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2010/03/18 12:09:27 UTC

DO NOT REPLY [Bug 48934] New: Cluster's regression. When replication fails once, replication can be never done again.

https://issues.apache.org/bugzilla/show_bug.cgi?id=48934

           Summary: Cluster's regression. When replication fails once,
                    replication can be never done again.
           Product: Tomcat 6
           Version: 6.0.26
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: regression
          Priority: P2
         Component: Cluster
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: fujino.keiichi@oss.ntt.co.jp


I found cluster's regression in Tomcat6.0.26. 

The reproduction is as follows.
=====
The cluster is composed of tomcat1 and tomcat2. 
(Transport className is
org.apache.catalina.tribes.transport.nio.PooledParallelSender.
 Perhaps, I think PooledMultiSender to be the same. )
Tomcat2 is stopped during session replication. 
As a result, Session replication failed and ChannelException is thrown. 
Tomcat2 restart. 
Session replication again.
As a result, following exception is thrown.
org.apache.catalina.tribes.ChannelException: Sender not connected.; No faulty
members identified.
=====

The cause is 
http://svn.apache.org/viewvc?view=revision&revision=908741
When replication fails, sender is disconnected by this fix.

The disconnect method is as follows in PooledParallelSender. 
===
public synchronized void disconnect() {
    this.connected = false;
    super.disconnect();

}
===
this.connected is set to false, and super.disconnect() is called. 
In super.disconnect(), the queue is closed. 

I think.
if connected is set to false once, it never becomes true again. 
and
if queue is closed once, it never opened again.
It is only ReplicationTransmitter#start to be able to set connected to true.
It is also the same to open the queue.

As a result,
when replication fails once, replication can be never done again.

I do not know the reason why r908741 is applied. 
However, if ChannelException is thrown once, it becomes impossible to use all
Sender.
This is not good thing.

Can revert r908741 ?
If it is not possible, what is the reason for the r908741?

Best regards.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48934] Cluster's regression. When replication fails once, replication can be never done again.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48934

--- Comment #1 from Filip Hanik <fh...@apache.org> 2010-03-18 13:56:35 UTC ---
Created an attachment (id=25146)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=25146)
Bug fix

Dear Fujino, as always you are right. The intended fix was to close sockets
that were potentially left in a CLOSE_WAIT state when something went wrong. But
instead of closing the actual sender that holds the TCP sockets, I accidentally
closed the entire sender system

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org