You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "John H. Embretsen (JIRA)" <ji...@apache.org> on 2008/04/22 15:46:23 UTC

[jira] Created: (DERBY-3639) Slave on Windows stops replication when network connection is broken, failover fails

Slave on Windows stops replication when network connection is broken, failover fails
------------------------------------------------------------------------------------

Key: DERBY-3639
URL: https://issues.apache.org/jira/browse/DERBY-3639
Project: Derby
Issue Type: Bug
Components: Replication
Affects Versions: 10.4.1.3
Environment: Master on Solaris 10 x86, Slave on Windows XP SP2.
Slave VM: Sun's Java HotSpot Client VM build 1.6.0_03-b05.
Reporter: John H. Embretsen

Replication: Failover on slave fails after network connection is broken (network cable to slave pulled out); database is shut down due to an "unexpected error".
Same experiment with a replicated embedded database on Linux (FC5) and Windows resulted in success on Linux and failure on Windows.

Documentation (admin guide, "Replication failure handling") says:

"Slave loses connection with master: The slave tries to reestablish the connection with the master by listening on the specified host and port. It will not give up until it is explicitly requested to do so by either the failover=true or stopSlave=true connection URL attribute. If a failover is requested, the slave applies all received log records and boots the database as described in Forcing a failover."

Slave console:

java -jar lib\derbyrun.jar ij

ij version 10.4

ij> connect 'jdbc:derby:replicDB;startSlave=true;slaveHost=0.0.0.0';

ERROR XRE08: Replication slave mode started successfully for database 'replicDB'. Connection refused because the database is in replication slave mode

ij> connect 'jdbc:derby:replicDB';

ERROR 08004: Connection refused to database 'replicDB' because it is in replication slave mode.

ij> -- network cable unplugged from slave

ij> connect 'jdbc:derby:replicDB;failover=true';

ERROR XRE11: Could not perform operation 'failover' because the database 'replicDB' has not been booted.

ij> connect 'jdbc:derby:replicDB';

ij>

The slave's derby.log reported the following after the network cable was pulled and the failover command was issued:

---- BEGIN REPLICATION ERROR MESSAGE (4/22/08 2:01 PM) ----

Replication slave got a fatal error for database 'replicDB'. Replication will be stopped.

ERROR XRE03: Unexpected replication error. See derby.log for details.

at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(Unknown Source)

Caused by: java.net.SocketException: Connection reset

at java.net.SocketInputStream.read(Unknown Source)

at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)

at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)

at java.io.ObjectInputStream.readObject0(Unknown Source)

at java.io.ObjectInputStream.readObject(Unknown Source)

at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(Unknown Source)

at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(Unknown Source)

[snipped further stack traces]

-------------------- END REPLICATION ERROR MESSAGE ---------------------

Replication slave role was stopped for database 'replicDB'.

2008-04-22 12:01:42.921 GMT:
Shutting down instance 601a400f-0119-7600-eb23-000000383460

----------------------------------------------------------------

Full derby.log from the slave is attached.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3639) Slave on Windows stops replication when network connection is broken, failover fails

Posted by "John H. Embretsen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John H. Embretsen updated DERBY-3639:
-------------------------------------

    Attachment: derby.log

> Slave on Windows stops replication when network connection is broken, failover fails
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-3639
>                 URL: https://issues.apache.org/jira/browse/DERBY-3639
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.1.3
>         Environment: Master on Solaris 10 x86, Slave on Windows XP SP2.
> Slave VM: Sun's Java HotSpot Client VM build 1.6.0_03-b05.
>            Reporter: John H. Embretsen
>         Attachments: derby.log
>
>
> Replication: Failover on slave fails after network connection is broken (network cable to slave pulled out); database is shut down due to an "unexpected error". 
> Same experiment with a replicated embedded database on Linux (FC5) and Windows resulted in success on Linux and failure on Windows.
> Documentation (admin guide, "Replication failure handling") says: 
> "Slave loses connection with master: The slave tries to reestablish the connection with the master by listening on the specified host and port. It will not give up until it is explicitly requested to do so by either the failover=true or stopSlave=true connection URL attribute. If a failover is requested, the slave applies all received log records and boots the database as described in Forcing a failover."
> Slave console:
> java -jar lib\derbyrun.jar ij
> ij version 10.4
> ij> connect 'jdbc:derby:replicDB;startSlave=true;slaveHost=0.0.0.0';
> ERROR XRE08: Replication slave mode started successfully for database 'replicDB'. Connection refused because the database is in replication slave mode
> .
> ij> connect 'jdbc:derby:replicDB';
> ERROR 08004: Connection refused to database 'replicDB' because it is in replication slave mode.
> ij> -- network cable unplugged from slave
> ij> connect 'jdbc:derby:replicDB;failover=true';
> ERROR XRE11: Could not perform operation 'failover' because the database 'replicDB' has not been booted.
> ij> connect 'jdbc:derby:replicDB';
> ij>
> The slave's derby.log reported the following after the network cable was pulled and the failover command was issued:
> ----  BEGIN REPLICATION ERROR MESSAGE (4/22/08 2:01 PM) ----
> Replication slave got a fatal error for database 'replicDB'. Replication will be stopped.
> ERROR XRE03: Unexpected replication error. See derby.log for details.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(Unknown Source)
> Caused by: java.net.SocketException: Connection reset
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(Unknown Source)
> [snipped further stack traces]
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Replication slave role was stopped for database 'replicDB'.
> Replication slave role was stopped for database 'replicDB'.
> 2008-04-22 12:01:42.921 GMT:
> Shutting down instance 601a400f-0119-7600-eb23-000000383460
> ----------------------------------------------------------------
> Full derby.log from the slave is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3639) Slave on Windows stops replication when network connection is broken, failover fails

Posted by "Øystein Grøvlen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591577#action_12591577 ] 

Øystein Grøvlen commented on DERBY-3639:
----------------------------------------

I think this occurs because the SlaveLogReceiverThread expect to get an EOFException on readMessage if connection to master is lost.  However, on Windows it seems that a SocketException 'Connection Reset' will occur.  A solution could be to handle SocketException the same way as EOFException.  I guess this may also have impact on how to behave in SlaveController#handleDisconnect.

Workaround:  Do a normal boot of the (former) slave database if the failover command fails.  Given that the slave failure occurred when the network connection was lost, this should give the same database state as a failover. 

> Slave on Windows stops replication when network connection is broken, failover fails
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-3639
>                 URL: https://issues.apache.org/jira/browse/DERBY-3639
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.1.3
>         Environment: Master on Solaris 10 x86, Slave on Windows XP SP2.
> Slave VM: Sun's Java HotSpot Client VM build 1.6.0_03-b05.
>            Reporter: John H. Embretsen
>         Attachments: derby.log
>
>
> Replication: Failover on slave fails after network connection is broken (network cable to slave pulled out); database is shut down due to an "unexpected error". 
> Same experiment with a replicated embedded database on Linux (FC5) and Windows resulted in success on Linux and failure on Windows.
> Documentation (admin guide, "Replication failure handling") says: 
> "Slave loses connection with master: The slave tries to reestablish the connection with the master by listening on the specified host and port. It will not give up until it is explicitly requested to do so by either the failover=true or stopSlave=true connection URL attribute. If a failover is requested, the slave applies all received log records and boots the database as described in Forcing a failover."
> Slave console:
> java -jar lib\derbyrun.jar ij
> ij version 10.4
> ij> connect 'jdbc:derby:replicDB;startSlave=true;slaveHost=0.0.0.0';
> ERROR XRE08: Replication slave mode started successfully for database 'replicDB'. Connection refused because the database is in replication slave mode
> .
> ij> connect 'jdbc:derby:replicDB';
> ERROR 08004: Connection refused to database 'replicDB' because it is in replication slave mode.
> ij> -- network cable unplugged from slave
> ij> connect 'jdbc:derby:replicDB;failover=true';
> ERROR XRE11: Could not perform operation 'failover' because the database 'replicDB' has not been booted.
> ij> connect 'jdbc:derby:replicDB';
> ij>
> The slave's derby.log reported the following after the network cable was pulled and the failover command was issued:
> ----  BEGIN REPLICATION ERROR MESSAGE (4/22/08 2:01 PM) ----
> Replication slave got a fatal error for database 'replicDB'. Replication will be stopped.
> ERROR XRE03: Unexpected replication error. See derby.log for details.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(Unknown Source)
> Caused by: java.net.SocketException: Connection reset
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(Unknown Source)
> [snipped further stack traces]
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Replication slave role was stopped for database 'replicDB'.
> Replication slave role was stopped for database 'replicDB'.
> 2008-04-22 12:01:42.921 GMT:
> Shutting down instance 601a400f-0119-7600-eb23-000000383460
> ----------------------------------------------------------------
> Full derby.log from the slave is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3639) Slave on Windows stops replication when network connection is broken, failover fails

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3639:
--------------------------------------

      Issue & fix info: [Workaround attached]
               Urgency: Normal
    Bug behavior facts: [Crash]

Triaged for 10.5.2.

> Slave on Windows stops replication when network connection is broken, failover fails
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-3639
>                 URL: https://issues.apache.org/jira/browse/DERBY-3639
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.1.3
>         Environment: Master on Solaris 10 x86, Slave on Windows XP SP2.
> Slave VM: Sun's Java HotSpot Client VM build 1.6.0_03-b05.
>            Reporter: John H. Embretsen
>            Priority: Minor
>         Attachments: derby.log
>
>
> Replication: Failover on slave fails after network connection is broken (network cable to slave pulled out); database is shut down due to an "unexpected error". 
> Same experiment with a replicated embedded database on Linux (FC5) and Windows resulted in success on Linux and failure on Windows.
> Documentation (admin guide, "Replication failure handling") says: 
> "Slave loses connection with master: The slave tries to reestablish the connection with the master by listening on the specified host and port. It will not give up until it is explicitly requested to do so by either the failover=true or stopSlave=true connection URL attribute. If a failover is requested, the slave applies all received log records and boots the database as described in Forcing a failover."
> Slave console:
> java -jar lib\derbyrun.jar ij
> ij version 10.4
> ij> connect 'jdbc:derby:replicDB;startSlave=true;slaveHost=0.0.0.0';
> ERROR XRE08: Replication slave mode started successfully for database 'replicDB'. Connection refused because the database is in replication slave mode
> .
> ij> connect 'jdbc:derby:replicDB';
> ERROR 08004: Connection refused to database 'replicDB' because it is in replication slave mode.
> ij> -- network cable unplugged from slave
> ij> connect 'jdbc:derby:replicDB;failover=true';
> ERROR XRE11: Could not perform operation 'failover' because the database 'replicDB' has not been booted.
> ij> connect 'jdbc:derby:replicDB';
> ij>
> The slave's derby.log reported the following after the network cable was pulled and the failover command was issued:
> ----  BEGIN REPLICATION ERROR MESSAGE (4/22/08 2:01 PM) ----
> Replication slave got a fatal error for database 'replicDB'. Replication will be stopped.
> ERROR XRE03: Unexpected replication error. See derby.log for details.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(Unknown Source)
> Caused by: java.net.SocketException: Connection reset
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(Unknown Source)
> [snipped further stack traces]
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Replication slave role was stopped for database 'replicDB'.
> Replication slave role was stopped for database 'replicDB'.
> 2008-04-22 12:01:42.921 GMT:
> Shutting down instance 601a400f-0119-7600-eb23-000000383460
> ----------------------------------------------------------------
> Full derby.log from the slave is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3639) Slave on Windows stops replication when network connection is broken, failover fails

Posted by "John H. Embretsen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John H. Embretsen updated DERBY-3639:
-------------------------------------

    Priority: Minor  (was: Major)

Downgraded Priority to Minor since the workaround is pretty straight forward and seems to work fine.

> Slave on Windows stops replication when network connection is broken, failover fails
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-3639
>                 URL: https://issues.apache.org/jira/browse/DERBY-3639
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.1.3
>         Environment: Master on Solaris 10 x86, Slave on Windows XP SP2.
> Slave VM: Sun's Java HotSpot Client VM build 1.6.0_03-b05.
>            Reporter: John H. Embretsen
>            Priority: Minor
>         Attachments: derby.log
>
>
> Replication: Failover on slave fails after network connection is broken (network cable to slave pulled out); database is shut down due to an "unexpected error". 
> Same experiment with a replicated embedded database on Linux (FC5) and Windows resulted in success on Linux and failure on Windows.
> Documentation (admin guide, "Replication failure handling") says: 
> "Slave loses connection with master: The slave tries to reestablish the connection with the master by listening on the specified host and port. It will not give up until it is explicitly requested to do so by either the failover=true or stopSlave=true connection URL attribute. If a failover is requested, the slave applies all received log records and boots the database as described in Forcing a failover."
> Slave console:
> java -jar lib\derbyrun.jar ij
> ij version 10.4
> ij> connect 'jdbc:derby:replicDB;startSlave=true;slaveHost=0.0.0.0';
> ERROR XRE08: Replication slave mode started successfully for database 'replicDB'. Connection refused because the database is in replication slave mode
> .
> ij> connect 'jdbc:derby:replicDB';
> ERROR 08004: Connection refused to database 'replicDB' because it is in replication slave mode.
> ij> -- network cable unplugged from slave
> ij> connect 'jdbc:derby:replicDB;failover=true';
> ERROR XRE11: Could not perform operation 'failover' because the database 'replicDB' has not been booted.
> ij> connect 'jdbc:derby:replicDB';
> ij>
> The slave's derby.log reported the following after the network cable was pulled and the failover command was issued:
> ----  BEGIN REPLICATION ERROR MESSAGE (4/22/08 2:01 PM) ----
> Replication slave got a fatal error for database 'replicDB'. Replication will be stopped.
> ERROR XRE03: Unexpected replication error. See derby.log for details.
> 	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(Unknown Source)
> Caused by: java.net.SocketException: Connection reset
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.net.SocketInputStream.read(Unknown Source)
> 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
> 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.SocketConnection.readMessage(Unknown Source)
> 	at org.apache.derby.impl.store.replication.net.ReplicationMessageReceive.readMessage(Unknown Source)
> [snipped further stack traces]
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Replication slave role was stopped for database 'replicDB'.
> Replication slave role was stopped for database 'replicDB'.
> 2008-04-22 12:01:42.921 GMT:
> Shutting down instance 601a400f-0119-7600-eb23-000000383460
> ----------------------------------------------------------------
> Full derby.log from the slave is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.