You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "V.Narayanan (JIRA)" <ji...@apache.org> on 2008/02/21 19:43:19 UTC

[jira] Issue Comment Edited: (DERBY-3364) Replication failover implementation must be modified to fail at the master after slave has been stopped

    [ https://issues.apache.org/jira/browse/DERBY-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571136#action_12571136 ] 

narayanan edited comment on DERBY-3364 at 2/21/08 10:41 AM:
--------------------------------------------------------------

Thank you for the ctrl+c tip Jorgen. 

Pls find below the changes to the files explained and also the run of 
the attached repro.

General Failure Analysis
-------------------------------------- 

There was a lot of discrepancy I observed in the runs.

1) Failover succeeds the first time saying it was successful
2) Failover hangs the second time it is called
3) exit hangs after the first Failover

These were due to a combination of reasons,

a) The read on the InputStream obtained in the client socket was not
   timing out
b) The Transmitter should close the socket when a failover is successful or
   unsuccessful. (This is a client socket, I am not sure it was a big problem)
c) The log shipper thread should be terminate upon failover failure or success.

Files Modified and Explanation
-----------------------------------------------

M      java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java

* Set a timeout on the socket that is translated as a timeout on the reads on the
  I/P streams
* Add method to tear down the socket obtained.

M      java/engine/org/apache/derby/impl/services/replication/master/MasterController.java

* handle the IOException being thrown from stopLogShipment because of the exception thrown
  by tearDown.
* The log shipper needs to be stopped when failover fails also inaddition to being stopped
  upon a success.

M      java/engine/org/apache/derby/impl/services/replication/master/AsynchronousLogShipper.java

* Make the log shipper tear down the socket and the streams obtained from the socket.

Repro Runs
-------------------

Embedded
--------

ij version 10.4
ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
ij> call syscs_util.syscs_freeze_database(); 
0 rows inserted/updated/deleted
ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost'; 

Did a ctrl+c on slave here

ij(CONNECTION1)> connect 'jdbc:derby:masterDB;user=oystein;password=pass;failover=true'; 
ERROR XRE21: Error occurred while performing failover for database 'masterDB', Failover attempt was aborted.
ij(CONNECTION1)> exit;

Client
------

ij version 10.4
ij> connect 'jdbc:derby://localhost:1527/replicationdb';
ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001';

Did a ctrl + c on slave here

ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;failover=true';
ERROR XRE21: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: replicationdbXRE21
ij(CONNECTION1)> exit;

This patch will clash with Derby-3428. The patch that is committed first
will break the other.

      was (Author: narayanan):
    Thank you for the ctrl+c clip Jorgen. 

Pls find below the changes to the files explained and also the run of 
the attached repro. 

There was a lot of discrepancy I observed in the runs.

1) Failover succeeds the first time saying it was successful
2) Failover hangs the second time it is called
3) exit hangs after the first Failover

These were due to a combination of reasons,

a) The read on the InputStream obtained in the client socket was not
   timing out
b) The Transmitter should close the socket when a failover is successful or
   unsuccessful. (This is a client socket, I am not sure it was a big problem)
c) The log shipper thread should be terminate upon failover failure or success.

M      java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java

* Set a timeout on the socket that is translated as a timeout on the reads on the
  I/P streams
* Add method to tear down the socket obtained.

M      java/engine/org/apache/derby/impl/services/replication/master/MasterController.java

* handle the IOException being thrown from stopLogShipment because of the exception thrown
  by tearDown.
* The log shipper needs to be stopped when failover fails also inaddition to being stopped
  upon a success.

M      java/engine/org/apache/derby/impl/services/replication/master/AsynchronousLogShipper.java

* Make the log shipper tear down the socket and the streams obtained from the socket.

Embedded
--------

ij version 10.4
ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true';
ij> call syscs_util.syscs_freeze_database(); 
0 rows inserted/updated/deleted
ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost'; 

Did a ctrl+c on slave here

ij(CONNECTION1)> connect 'jdbc:derby:masterDB;user=oystein;password=pass;failover=true'; 
ERROR XRE21: Error occurred while performing failover for database 'masterDB', Failover attempt was aborted.
ij(CONNECTION1)> exit;

Client
------

ij version 10.4
ij> connect 'jdbc:derby://localhost:1527/replicationdb';
ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001';

Did a ctrl + c on slave here

ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;failover=true';
ERROR XRE21: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: replicationdbXRE21
ij(CONNECTION1)> exit;

This patch will clash with Derby-3428. The patch that is committed first
will break the other.
  
> Replication failover implementation must be modified to fail at the master after slave has been stopped
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3364
>                 URL: https://issues.apache.org/jira/browse/DERBY-3364
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.0.0
>            Reporter: V.Narayanan
>            Assignee: V.Narayanan
>         Attachments: Derby3364_v1.diff, Derby3364_v1.stat
>
>
> Jorgen says...
> I tried to run the failover command on the master, which seems to work fine as long as the master and slave are still connected. If the slave has been stopped for some reason, however, failover hangs on MasterController#startFailover here: 
> ack = transmitter.readMessage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.