You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by benrahman <md...@gmail.com> on 2013/06/11 10:50:23 UTC

Replication Master stop, but Slave still alive

Hi All,

I have setup replication following the steps as shown in  here
<http://wiki.apache.org/db-derby/ReplicationWriteup>  
(http://wiki.apache.org/db-derby/ReplicationWriteup).

After start the replication, I checked derby.log on both side. It shows that
replication has successfully started.

/Master derby.log/

Wed Jun 05 10:17:10 SGT 2013 : Apache Derby Network Server - 10.8.2.2 -
(1181258) started and ready to accept connections on port 1527
----------------------------------------------------------------
Wed Jun 05 10:18:15 SGT 2013:
Booting Derby version The Apache Software Foundation - Apache Derby -
10.8.2.2 - (1181258): instance a816c00e-013f-121f-a03d-ffff87382be3 
on database directory E:\Derby\databases\house  with class loader
sun.misc.Launcher$AppClassLoader@11b86e7 
Loaded from file:/E:/Derby/javadb/lib/derby.jar
java.vendor=Sun Microsystems Inc.
java.runtime.version=1.6.0_25-b06
user.dir=E:\user\Derby\Replication Testing Kit
derby.system.home=E:\Derby\databases
Database Class Loader started - derby.database.classpath=''
*Replication master role started for database 'house'.*

/Slave derby.log/

Wed Jun 05 10:17:12 SGT 2013 : Apache Derby Network Server - 10.8.2.2 -
(1181258) started and ready to accept connections on port 1528
----------------------------------------------------------------
Wed Jun 05 10:18:03 SGT 2013:
Booting Derby version The Apache Software Foundation - Apache Derby -
10.8.2.2 - (1181258): instance a816c00e-013f-133d-bfb3-00007314807e 
on database directory E:\Derby\databases\Slave\house  with class loader
sun.misc.Launcher$AppClassLoader@11b86e7 
Loaded from file:/E:/Derby/javadb/lib/derby.jar
java.vendor=Sun Microsystems Inc.
java.runtime.version=1.6.0_25-b06
user.dir=E:\user\Derby\Replication Testing Kit
derby.system.home=E:\Derby\databases\Slave
Database Class Loader started - derby.database.classpath=''
----------------------------------------------------------------
Wed Jun 05 10:18:04 SGT 2013:
Shutting down instance a816c00e-013f-133d-bfb3-00007314807e on database
directory E:\Derby\databases\Slave\house with class loader
sun.misc.Launcher$AppClassLoader@11b86e7 
----------------------------------------------------------------
Wed Jun 05 10:18:04 SGT 2013:
Booting Derby version The Apache Software Foundation - Apache Derby -
10.8.2.2 - (1181258): instance 601a400f-013f-133d-bfb3-00007314807e 
on database directory E:\Derby\databases\Slave\house  with class loader
sun.misc.Launcher$AppClassLoader@11b86e7 
Loaded from file:/E:/Derby/javadb/lib/derby.jar
java.vendor=Sun Microsystems Inc.
java.runtime.version=1.6.0_25-b06
user.dir=E:\user\Derby\Replication Testing Kit
derby.system.home=E:\Derby\databases\Slave
Replication slave database 'house' listens for connections from master on
'localhost:8001'.
Replication slave role started for database 'house'.
Wed Jun 05 10:18:16 SGT 2013 Thread[DRDAConnThread_3,5,main] Cleanup action
starting
ERROR XRE08: Replication slave mode started successfully for database
'house'. Connection refused because the database is in replication slave
mode. 
	at org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(Unknown Source)
	at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
	at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
	at org.apache.derby.jdbc.EmbeddedDriver.connect(Unknown Source)
	at org.apache.derby.impl.drda.Database.makeConnection(Unknown Source)
	at
org.apache.derby.impl.drda.DRDAConnThread.getConnFromDatabaseName(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.verifyUserIdPassword(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.parseSECCHK(Unknown Source)
	at org.apache.derby.impl.drda.DRDAConnThread.parseDRDAConnection(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
Cleanup action completed
Wed Jun 05 10:18:16 SGT 2013 Thread[DRDAConnThread_3,5,main] Cleanup action
starting
ERROR XRE08: Replication slave mode started successfully for database
'house'. Connection refused because the database is in replication slave
mode. 
	at org.apache.derby.iapi.error.StandardException.newException(Unknown
Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection40.<init>(Unknown Source)
	at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
	at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
	at org.apache.derby.jdbc.EmbeddedDriver.connect(Unknown Source)
	at org.apache.derby.impl.drda.Database.makeConnection(Unknown Source)
	at
org.apache.derby.impl.drda.DRDAConnThread.getConnFromDatabaseName(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.verifyUserIdPassword(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.parseSECCHK(Unknown Source)
	at org.apache.derby.impl.drda.DRDAConnThread.parseDRDAConnection(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
Cleanup action completed
Wed Jun 05 10:18:16 SGT 2013 Thread[DRDAConnThread_3,5,main] (DATABASE =
house), (DRDAID = {1}), *Replication slave mode started successfully for
database 'house'. Connection refused because the database is in replication
slave mode. *

Then after a while, I noticed that master already down and try to reconnect
to slave.

/Master derby.log/

----  BEGIN REPLICATION ERROR MESSAGE (6/5/13 3:35 PM) ----
Exception occurred during log shipping.
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1756)
	at java.io.ObjectOutputStream.reset(ObjectOutputStream.java:483)
	at
org.apache.derby.impl.store.replication.net.SocketConnection.writeMessage(Unknown
Source)
	at
org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit.sendMessage(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.shipALogChunk(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.run(Unknown
Source)

--------------------  END REPLICATION ERROR MESSAGE ---------------------
*Replication master trying to reconnect to slave for database 'house'.*

I checked in Slave derby.log, it still show the slave mode message.
Replication Master eventually down few hours later.

/Master derby.log/

----  BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ----
Exception occurred during log shipping.
org.apache.derby.impl.store.replication.buffer.LogBufferFullException
	at
org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(Unknown
Source)
	at
org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.appendLog(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.MasterController.appendLog(Unknown
Source)
	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeToLog(Unknown
Source)
	at
org.apache.derby.impl.store.raw.log.LogAccessFile.flushDirtyBuffers(Unknown
Source)
	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source)
	at org.apache.derby.impl.store.raw.xact.Xact.xa_prepare(Unknown Source)
	at org.apache.derby.impl.store.access.RAMTransaction.xa_prepare(Unknown
Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.xa_prepare(Unknown Source)
	at org.apache.derby.jdbc.XATransactionState.xa_prepare(Unknown Source)
	at org.apache.derby.jdbc.EmbedXAResource.prepare(Unknown Source)
	at org.apache.derby.impl.drda.DRDAXAProtocol.prepareXATransaction(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAXAProtocol.parseSYNCCTL(Unknown Source)
	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

--------------------  END REPLICATION ERROR MESSAGE ---------------------
----  BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ----
Exception occurred during log shipping.
java.net.SocketException: Software caused connection abort: socket write
error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1756)
	at java.io.ObjectOutputStream.reset(ObjectOutputStream.java:483)
	at
org.apache.derby.impl.store.replication.net.SocketConnection.writeMessage(Unknown
Source)
	at
org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit.sendMessage(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.shipALogChunk(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.AsynchronousLogShipper.flushBuffer(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.MasterController.stopMaster(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.MasterController.printStackAndStopMaster(Unknown
Source)
	at
org.apache.derby.impl.store.replication.master.MasterController.appendLog(Unknown
Source)
	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeToLog(Unknown
Source)
	at
org.apache.derby.impl.store.raw.log.LogAccessFile.flushDirtyBuffers(Unknown
Source)
	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
	at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source)
	at org.apache.derby.impl.store.raw.xact.Xact.xa_prepare(Unknown Source)
	at org.apache.derby.impl.store.access.RAMTransaction.xa_prepare(Unknown
Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.xa_prepare(Unknown Source)
	at org.apache.derby.jdbc.XATransactionState.xa_prepare(Unknown Source)
	at org.apache.derby.jdbc.EmbedXAResource.prepare(Unknown Source)
	at org.apache.derby.impl.drda.DRDAXAProtocol.prepareXATransaction(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAXAProtocol.parseSYNCCTL(Unknown Source)
	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown
Source)
	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

--------------------  END REPLICATION ERROR MESSAGE ---------------------
*Replication master role stopped for database 'house'.*

I try to connect to slave database, but the database is still in slave mode.
I need to do StopSlave in order to access the database. 

Anyone encounter this situation before? What has caused the master to go
down?

Will appreciate any explanations from you guys. Thanks!



--
View this message in context: http://apache-database.10148.n7.nabble.com/Replication-Master-stop-but-Slave-still-alive-tp131072.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: Replication Master stop, but Slave still alive

Posted by Dag Wanvik <da...@oracle.com>.

On 13.06.2013 16:00, benrahman wrote:
> Is there any way to check if slave did receive any log shipping from
> master? Knut Anders Hatlen-5 wrote 

I guess you should be able to see the database log files (not derby.log)
increase in size and number as log records are read and processed on the
slave.

Dag


>> That's right. It is supposed to try to reconnect until there's no more
>> space in the replication log buffers, according to
>> http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailures.html.
>>
>> I think the slave never fails over automatically, even if it detects
>> that it has lost contact with the master. It has to be told to do so.
>> See
>> http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailover.html,
>> which says:
>>
>>   There is no automatic failover or restart of replication after one of
>>   the instances has failed.
>>
>>
>> -- 
>> Knut Anders
> Meaning slave will not stop automatically even though the connection with
> master has lost? I guess that will explained why my replication slave still
> active, even when master already stop.
>
> Ben
>
>
>
>
> --
> View this message in context: http://apache-database.10148.n7.nabble.com/Replication-Master-stop-but-Slave-still-alive-tp131072p131154.html
> Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: Replication Master stop, but Slave still alive

Posted by benrahman <md...@gmail.com>.

Hi All, Thanks for the reply.


Dag Wanvik wrote
> Looks like the socket the master uses to ship records to slave stopped
> working; hard to say what's the issue here. Do you see anything in the
> slave's log file at this time instant?

If you mean by derby.log, no new log being written, the last message mention
that database in Slave mode. I checked the log folder that contains .dat
files, the sequence also didn't increase. When i start replication, it was
log1.dat, and when this happen, it still log1.dat.

Is there any way to check if slave did receive any log shipping from master?


Knut Anders Hatlen-5 wrote
> That's right. It is supposed to try to reconnect until there's no more
> space in the replication log buffers, according to
> http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailures.html.
> 
> I think the slave never fails over automatically, even if it detects
> that it has lost contact with the master. It has to be told to do so.
> See
> http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailover.html,
> which says:
> 
>   There is no automatic failover or restart of replication after one of
>   the instances has failed.
> 
> 
> -- 
> Knut Anders

Meaning slave will not stop automatically even though the connection with
master has lost? I guess that will explained why my replication slave still
active, even when master already stop.

Ben




--
View this message in context: http://apache-database.10148.n7.nabble.com/Replication-Master-stop-but-Slave-still-alive-tp131072p131154.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.

Re: Replication Master stop, but Slave still alive

Posted by Knut Anders Hatlen <kn...@oracle.com>.

Dag Wanvik <da...@oracle.com> writes:

> On 11.06.2013 18:50, benrahman wrote:
>
>     /Master derby.log/
>     
>     ----  BEGIN REPLICATION ERROR MESSAGE (6/5/13 3:35 PM) ----
>     Exception occurred during log shipping.
>     java.net.SocketException: Connection reset by peer: socket write error
>             at java.net.SocketOutputStream.socketWrite0(Native Method)
>             at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>             at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>
> Looks like the socket the master uses to ship records to slave stopped working; hard to say what's the issue here. Do you see anything
> in the slave's log file at this time instant?
>
> Later replication error messages in the master's log file show that the buffer grows full (since it can't send):
>
>> ----  BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ----
>> Exception occurred during log shipping.
>> org.apache.derby.impl.store.replication.buffer.LogBufferFullException
>>       at
>> org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(Unknown
>
> Not sure why the slave doesn't fail over; maybe the master process needs to be stopped (crash) before it will happen..
> It is probably right that it doesn't happen when you first see the socket write error; it could be due to a intermittent network error.

That's right. It is supposed to try to reconnect until there's no more
space in the replication log buffers, according to
http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailures.html.

> But I believe the slave and master have a keep-alive protocol to enable the slave to fail over when the master is not longer seen to be
> alive.

I think the slave never fails over automatically, even if it detects
that it has lost contact with the master. It has to be told to do so.
See http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailover.html,
which says:

  There is no automatic failover or restart of replication after one of
  the instances has failed.


-- 
Knut Anders

Re: Replication Master stop, but Slave still alive

Posted by Dag Wanvik <da...@oracle.com>.

On 11.06.2013 18:50, benrahman wrote:
> /Master derby.log/
>
> ----  BEGIN REPLICATION ERROR MESSAGE (6/5/13 3:35 PM) ----
> Exception occurred during log shipping.
> java.net.SocketException: Connection reset by peer: socket write error
> 	at java.net.SocketOutputStream.socketWrite0(Native Method)
> 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> 	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)

Looks like the socket the master uses to ship records to slave stopped
working; hard to say what's the issue here. Do you see anything in the
slave's log file at this time instant?

Later replication error messages in the master's log file show that the
buffer grows full (since it can't send):

> ----  BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ----
> Exception occurred during log shipping.
> org.apache.derby.impl.store.replication.buffer.LogBufferFullException
>	at
> org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(Unknown

Not sure why the slave doesn't fail over; maybe the master process needs
to be stopped (crash) before it will happen..
It is probably right that it doesn't happen when you first see the
socket write error; it could be due to a intermittent network error.
But I believe the slave and master have a keep-alive protocol to enable
the slave to fail over when the master is not longer seen to be alive.

Dag