You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Dag H. Wanvik (JIRA)" <ji...@apache.org> on 2009/05/16 01:12:46 UTC

[jira] Updated: (DERBY-3719) '...replication.buffer.LogBufferFullException' causes failover to fail w/ 'XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.'

     [ https://issues.apache.org/jira/browse/DERBY-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-3719:
---------------------------------

    Attachment: traceLogShipping.stat
                traceLogShipping.diff

This log from master's log file shows what happens. The output is
produced by the patch attached (traceLogShipping):

@1242410514203 Sending done
@1242410514205 >= FI_HIGH
@1242410514206 >= FI_HIGH
@1242410514204 Sending
@1242410514208 >= FI_HIGH
@1242410514211 >= FI_HIGH
@1242410514216 log buffer full, try to force flush
@1242410514216 forceflush
@1242410514265 Sending done
@1242410514266 log buffer full, force failed
----  BEGIN REPLICATION ERROR MESSAGE (15.05.09 20:01) ----
@1242410514267 Sending
@1242410514286 Sending done
@1242410514286 Sending
Exception occurred during log shipping.
org.apache.derby.impl.store.replication.buffer.LogBufferFullException
	at org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(ReplicationLogBuffer.java:357)

The asynchronous log shipper basically does this loops:

       while (true) {
            ship a log chunk
            
            if ! <things are busy>
                wait(shippingInterval)
            fi
      }

>From derby.log, we see that the sending of a chunk starts a instant
4204, and sending is complete at 4265.

In the meantime, the user thread is busy writing log using
ReplicationLogBuffer.appendLog. Now, the buffer is getting full as
seen in the "instant 4208: >= FI_HIGH" (ReplicationLogBuffer calls
switchDirtyBuffer which will return a free buffer if there is one,
then it calls MasterController.workToDo to make sure the shipping
thread knows). 

Notice the send is still waiting to complete.  Now, another log write
happens, and again, we see this indication that we are close to having
0 free buffers (instant 4211: >= FI_HIGH). This is the second time workToDo is
called. In both cases, AsynchronousLogShipper.workToDo does a notify
to try to wake up the thread on the assumption is may be sleeping in
the "wait(shippingInterval)" seen above. But in reality, the shipping
thread is still waiting for the currently active thread to finish its
sending, so this has no effect.

Now, have a look at MasterController.appendLog (which calls
ReplicationLogBuffer.appendLog). Notice that is it gets a
LogBufferFullException it will try to force the log shipper to flush,
and then retry the append on the assumption that at least one free
buffer has been returned to the pool.  Now, have a look at the code of
ASL.forceFlush (called at instant 4216). What this code does is to try
to wake up the sleeping shipper thread with the call to
notify(). Sadly, the shipping thread is still not finished with its
write (it takes 4265-4204= 61 ms), so the forceflush just returns and
allows MasterController.appendLog to fail for the second time. And
this time the LogBufferFullException is the kiss of death (4266 log
buffer full, force failed).

Notice how the shipping thread still thinks all is hunky dory, it
starts a new ship at instant 4267, but, alas it is now too late, since
the master thread has given up, and started to tear down.

So, in conclusion, the logic to force log shipping before we attempt a
retry of the log append is flawed.


> '...replication.buffer.LogBufferFullException' causes failover to fail w/ 'XRE07, SQLERRMC: Could not perform operation because the database is not in replication master mode.'
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3719
>                 URL: https://issues.apache.org/jira/browse/DERBY-3719
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.2.0, 10.5.1.1
>         Environment: HW: 2 X i86pc i386 (AMD Opteron(tm) Processor 252): 2593 MHz, unknown cache. 3968 Megabytes Total Memory.
> OS: Solaris 10 5/08 s10x_u5wos_10 X86 64bits - SunOS 5.10 Generic_127128-11
> JVM: Sun Microsystems Inc.
>     java version "1.6.0_06"
>     Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
>     Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode)
>            Reporter: Ole Solberg
>         Attachments: 12.tar.gz, traceLogShipping.diff, traceLogShipping.stat
>
>
> With the patch for DERBY-3709, derby-3709_p1-v2.diff.txt,  I was able to provoke this error twice in 30 test runs on this platform (On another platform I saw none in 100 test runs.)
> I will upload the full test run log dir.
> "Summary":
> 1) testReplication_Local_StateTest_part2(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_StateTest_part2)junit.framework.ComparisonFailure: Unexpected SQL state. expected:<XRE[20]> but was:<XRE[07]>
> Master derby.log:
> -----------------------------------------
> ----  BEGIN REPLICATION ERROR MESSAGE (6/10/08 4:08 PM) ----
> Exception occurred during log shipping.
> org.apache.derby.impl.store.replication.buffer.LogBufferFullException
> 	at org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(ReplicationLogBuffer.java:357)
> 	at org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.appendLog(ReplicationLogBuffer.java:146)
> 	at org.apache.derby.impl.store.replication.master.MasterController.appendLog(MasterController.java:428)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeToLog(LogAccessFile.java:787)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.flushDirtyBuffers(LogAccessFile.java:534)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.flushLogAccessFile(LogAccessFile.java:574)
> 	at org.apache.derby.impl.store.raw.log.LogAccessFile.writeLogRecord(LogAccessFile.java:332)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.appendLogRecord(LogToFile.java:3759)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.logAndDo(FileLogger.java:370)
> 	at org.apache.derby.impl.store.raw.xact.Xact.logAndDo(Xact.java:1193)
> 	at org.apache.derby.impl.store.raw.data.LoggableActions.doAction(LoggableActions.java:221)
> 	at org.apache.derby.impl.store.raw.data.LoggableActions.actionUpdate(LoggableActions.java:85)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.doUpdateAtSlot(StoredPage.java:8463)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.updateOverflowDetails(StoredPage.java:8336)
> 	at org.apache.derby.impl.store.raw.data.StoredPage.updateOverflowDetails(StoredPage.java:8319)
> 	at org.apache.derby.impl.store.raw.data.BasePage.insertAllowOverflow(BasePage.java:808)
> 	at org.apache.derby.impl.store.raw.data.BasePage.insert(BasePage.java:653)
> 	at org.apache.derby.impl.store.access.heap.HeapController.doInsert(HeapController.java:307)
> 	at org.apache.derby.impl.store.access.heap.HeapController.insert(HeapController.java:575)
> 	at org.apache.derby.impl.sql.execute.RowChangerImpl.insertRow(RowChangerImpl.java:457)
> 	at org.apache.derby.impl.sql.execute.InsertResultSet.normalInsertCore(InsertResultSet.java:1011)
> 	at org.apache.derby.impl.sql.execute.InsertResultSet.open(InsertResultSet.java:487)
> 	at org.apache.derby.impl.sql.GenericPreparedStatement.execute(GenericPreparedStatement.java:384)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(EmbedStatement.java:1235)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(EmbedPreparedStatement.java:1652)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(EmbedPreparedStatement.java:1307)
> 	at org.apache.derby.impl.drda.DRDAStatement.execute(DRDAStatement.java:672)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTTobjects(DRDAConnThread.java:4197)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(DRDAConnThread.java:4001)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:991)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:278)
> --------------------  END REPLICATION ERROR MESSAGE ---------------------
> Slave derby.log:
> -------------------------------------------------------------------------------------------
> 2008-06-10 14:05:56.408 GMT Thread[DRDAConnThread_3,5,main] (DATABASE = /export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat), (DRDAID = {2}), Replication slave mode started successfully for database '/export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat'. Connection refused because the database is in replication slave mode. 
> Replication slave role was stopped for database '/export/home/tmp/os136789/testingInMyDerbySandbox/12/db_slave/wombat'.
> ------------  BEGIN SHUTDOWN ERROR STACK -------------
> ERROR XSLA7: Cannot redo operation null in the log.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:296)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.redo(FileLogger.java:1525)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.recover(LogToFile.java:920)
> 	at org.apache.derby.impl.store.raw.RawStore.boot(RawStore.java:334)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.boot(RAMAccessManager.java:1019)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(BaseMonitor.java:1999)
> 	at org.apache.derby.impl.services.monitor.TopService.bootModule(TopService.java:291)
> 	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(BaseMonitor.java:553)
> 	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Monitor.java:427)
> 	at org.apache.derby.impl.db.BasicDatabase.bootStore(BasicDatabase.java:780)
> 	at org.apache.derby.impl.db.BasicDatabase.boot(BasicDatabase.java:196)
> 	at org.apache.derby.impl.db.SlaveDatabase.bootBasicDatabase(SlaveDatabase.java:424)
> 	at org.apache.derby.impl.db.SlaveDatabase.access$000(SlaveDatabase.java:70)
> 	at org.apache.derby.impl.db.SlaveDatabase$SlaveDatabaseBootThread.run(SlaveDatabase.java:311)
> 	at java.lang.Thread.run(Thread.java:619)
> Caused by: ERROR 08006: Database '{0}' shutdown.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.stopReplicationSlaveRole(LogToFile.java:5142)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.stopSlave(SlaveController.java:266)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.access$500(SlaveController.java:64)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:531)
> ============= begin nested exception, level (1) ===========
> ERROR 08006: Database '{0}' shutdown.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:276)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.stopReplicationSlaveRole(LogToFile.java:5142)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.stopSlave(SlaveController.java:266)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController.access$500(SlaveController.java:64)
> 	at org.apache.derby.impl.store.replication.slave.SlaveController$SlaveLogReceiverThread.run(SlaveController.java:531)
> ============= end nested exception, level (1) ===========
> ------------  END SHUTDOWN ERROR STACK -------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.