You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by da...@apache.org on 2011/07/19 16:29:04 UTC

svn commit: r1148354 - in /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data: RAFContainer.java RAFContainer4.java

Author: dag
Date: Tue Jul 19 14:29:03 2011
New Revision: 1148354

URL: http://svn.apache.org/viewvc?rev=1148354&view=rev
Log:
DERBY-5325 Checkpoint fails with ClosedChannelException in InterruptResilienceTest

Patch derby-5325a:

With NIO, writeRAFHeader has two methods leading to interruptible IO:
 - getEmbryonicPage
 - writeHeader
 
Currently, getEmbryonicPage may throw InterruptDetectedException and
hence, so may writeRAFHeader.

writeHeader may throw ClosedByInterruptException,
AsynchronousCloseException and ClosedChannelException because
writeHeader does not use RAFContainer4#writePage, but rather uses
RAFContainer4#writeAtOffset, which does not currently attempt to
recover after interrupt.

So currently, clients of writeRAFHeader need to be prepared for all of
InterruptDetectedException, ClosedByInterruptException,
AsynchronousCloseException and ClosedChannelException.

writeRAFHeader is used in three locations:

 - RAFContainer#clean
 - RAFContainer#run(CREATE_CONTAINER_ACTION)
 - RAFContainer#run(STUBBIFY_ACTION)

RAFContainer#clean is prepared for InterruptDetectedException
only. The issue shows that ClosedChannelException may also occur, and
it is not prepared for that (this bug).

RAFContainer#run(CREATE_CONTAINER_ACTION) is prepared for
ClosedByInterruptException and AsynchronousCloseException. Since IO
during container creation is single-threaded, this is sufficient: it
should never need to handle
ClosedChannelException/InterruptDetectedException, both of which
signal that another thread saw interrupt on the container channel.

RAFContainer#run(STUBBIFY_ACTION) is part of the removeContainer
operation which should happen after the container is closed, so it
should be single-threaded on the container as well(?). It should
handle ClosedByInterruptException and AsynchronousCloseException and
do retry, but doesn't, currently.

If we let writeAtOffset clean up just like writePage,
RAFContainer4#writeAtOffset (i.e.also writeHeader) would only only
throw InterruptDetectedException, i.e. another thread saw interrupt,
so retry. This would simplify logic in RAFContainer: we could remove
the retry logic from RAFContainer#run(CREATE_CONTAINER_ACTION). This
could also cover retry logic for RAFContainer#run(STUBBIFY_ACTION) wrt
its use of writeRAFHeader.

Next, RAFContainer#clean is already handling
InterruptDetectedException and would with this change no longer see
ClosedByInterruptException, AsynchronousCloseException or
ClosedChannelException. This should solve DERBY-5325 (this bug).


Modified:
    db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
    db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java?rev=1148354&r1=1148353&r2=1148354&view=diff
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java (original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java Tue Jul 19 14:29:03 2011
@@ -1328,96 +1328,83 @@ class RAFContainer extends FileContainer
                  throw StandardException.newException( SQLState.FILE_CREATE, se, file);
              }
 
-             boolean success = false;
-             int maxTries = MAX_INTERRUPT_RETRIES;
-             while (!success) {
-                 success = true;
-
-                 try {
+             try {
 
-                     // OK not to force WAL here, in fact, this operation
-                     // preceeds the creation of the log record to ensure
-                     // sufficient space.
+                 // OK not to force WAL here, in fact, this operation
+                 // preceeds the creation of the log record to ensure
+                 // sufficient space.
 
-                     dataFactory.writeInProgress();
-                     try
+                 dataFactory.writeInProgress();
+                 try
                      {
                          fileData = file.getRandomAccessFile( "rw");
                      }
-                     finally
+                 finally
                      {
                          dataFactory.writeFinished();
                      }
 
-                     // This container format specifies that the first page is
-                     // an allocation page and the container information is
-                     // stored within it.  The allocation page needs to be
-                     // somewhat formatted because if the system crashed after
-                     // the create container log operation is written, it needs
-                     // to be well formed enough to get the container
-                     // information back out of it.
-                     //
-                     // Don't try to go thru the page cache here because the
-                     // container object cannot be found in the container cache
-                     // at this point yet.  However, if we use the page cache
-                     // to store the first allocation page, then in order to
-                     // write itself out, it needs to ask the container to do
-                     // so, which is going to create a deadlock.  The
-                     // allocation page cannot write itself out without going
-                     // thru the container because it doesn't know where its
-                     // offset is.  Here we effectively hardwire page 0 at
-                     // offset 0 of the container file to be the first
-                     // allocation page.
+                 // This container format specifies that the first page is
+                 // an allocation page and the container information is
+                 // stored within it.  The allocation page needs to be
+                 // somewhat formatted because if the system crashed after
+                 // the create container log operation is written, it needs
+                 // to be well formed enough to get the container
+                 // information back out of it.
+                 //
+                 // Don't try to go thru the page cache here because the
+                 // container object cannot be found in the container cache
+                 // at this point yet.  However, if we use the page cache
+                 // to store the first allocation page, then in order to
+                 // write itself out, it needs to ask the container to do
+                 // so, which is going to create a deadlock.  The
+                 // allocation page cannot write itself out without going
+                 // thru the container because it doesn't know where its
+                 // offset is.  Here we effectively hardwire page 0 at
+                 // offset 0 of the container file to be the first
+                 // allocation page.
+
+                 // create an embryonic page - if this is not a temporary
+                 // container, synchronously write out the file header.
+
+                 canUpdate = true; // Need to set it now. After writeRAFHeader
+                                   // may be too late in case that method's IO
+                                   // is interrupted and container needs
+                                   // reopening. To get the correct "rw" mode
+                                   // we need canUpdate to be true.
+
+                 writeRAFHeader(
+                     actionIdentity, fileData, true,
+                     (actionIdentity.getSegmentId() !=
+                      ContainerHandle.TEMPORARY_SEGMENT));
 
-                     // create an embryonic page - if this is not a temporary
-                     // container, synchronously write out the file header.
-                     writeRAFHeader(
-                         actionIdentity, fileData, true,
-                         (actionIdentity.getSegmentId() !=
-                          ContainerHandle.TEMPORARY_SEGMENT));
-
-                 } catch (IOException ioe) {
-                     Class clazz = ioe.getClass();
-
-                     // test with reflection since NIO is not in Foundation 1.1
-                     if (clazz.getName().equals(
-                             "java.nio.channels.ClosedByInterruptException") ||
-                         clazz.getName().equals( // Java NIO Bug 6979009:
-                             "java.nio.channels.AsynchronousCloseException")) {
-
-                         if (--maxTries > 0) {
-                             success = false;
-                             InterruptStatus.setInterrupted();
-                             closeContainer();
-                             continue;
-                         }
-                     }
+             } catch (IOException ioe) {
 
-                     boolean fileDeleted;
-                     try {
-                         fileDeleted = privRemoveFile(file);
-                     } catch (SecurityException se) {
-                         throw StandardException.newException(
-                             SQLState.FILE_CREATE_NO_CLEANUP,
-                             ioe,
-                             file,
-                             se.toString());
-                     }
+                 canUpdate = false;
 
-                     if (!fileDeleted) {
-                         throw StandardException.newException(
-                             SQLState.FILE_CREATE_NO_CLEANUP,
-                             ioe,
-                             file,
-                             ioe.toString());
-                     }
+                 boolean fileDeleted;
+                 try {
+                     fileDeleted = privRemoveFile(file);
+                 } catch (SecurityException se) {
+                     throw StandardException.newException(
+                         SQLState.FILE_CREATE_NO_CLEANUP,
+                         ioe,
+                         file,
+                         se.toString());
+                 }
 
+                 if (!fileDeleted) {
                      throw StandardException.newException(
-                         SQLState.FILE_CREATE, ioe, file);
+                         SQLState.FILE_CREATE_NO_CLEANUP,
+                         ioe,
+                         file,
+                         ioe.toString());
                  }
+
+                 throw StandardException.newException(
+                     SQLState.FILE_CREATE, ioe, file);
              }
 
-             canUpdate = true;
              return null;
          } // end of case CREATE_CONTAINER_ACTION
 

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java?rev=1148354&r1=1148353&r2=1148354&view=diff
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java Tue Jul 19 14:29:03 2011
@@ -622,10 +622,6 @@ class RAFContainer4 extends RAFContainer
                     threadsInPageIO,
                     hashCode());
 
-                // Recovery is in progress, wait for another
-                // interrupted thread to clean up, i.e. act as if we
-                // had seen ClosedChannelException.
-
                 awaitRestoreChannel(e, stealthMode);
                 if (retries-- == 0) {
                     throw StandardException.newException(
@@ -1089,10 +1085,68 @@ class RAFContainer4 extends RAFContainer
             throws IOException, StandardException
     {
         FileChannel ioChannel = getChannel(file);
-        if (ioChannel != null) {
-            writeFull(ByteBuffer.wrap(bytes), ioChannel, offset);
-        } else {
+
+        if (ioChannel == null) {
             super.writeAtOffset(file, bytes, offset);
+            return;
+        }
+
+        ourChannel = ioChannel;
+
+        boolean success = false;
+        boolean stealthMode = true;
+
+        while (!success) {
+
+            synchronized (this) {
+                // don't use ourChannel directly, could need re-initilization
+                // after interrupt and container reopening:
+                ioChannel = getChannel();
+            }
+
+            try {
+                writeFull(ByteBuffer.wrap(bytes), ioChannel, offset);
+                success = true;
+            //} catch (ClosedByInterruptException e) {
+            // Java NIO Bug 6979009:
+            // http://bugs.sun.com/view_bug.do?bug_id=6979009
+            // Sometimes NIO throws AsynchronousCloseException instead of
+            // ClosedByInterruptException
+            } catch (AsynchronousCloseException e) {
+                // Subsumes ClosedByInterruptException
+
+                // The interrupted thread may or may not get back here
+                // before other concurrent writers that will see
+                // ClosedChannelException, we have logic to handle that.
+
+                if (Thread.currentThread().isInterrupted()) {
+                    // Normal case
+                    if (recoverContainerAfterInterrupt(
+                                e.toString(),
+                                stealthMode)) {
+                        continue; // do I/O over again
+                    }
+                }
+                // Recovery is in progress, wait for another
+                // interrupted thread to clean up, i.e. act as if we
+                // had seen ClosedChannelException.
+
+                // stealthMode == true, so this will throw
+                // InterruptDetectedException
+                awaitRestoreChannel(e, stealthMode);
+            } catch (ClosedChannelException e) {
+                // We are not the thread that first saw the channel interrupt,
+                // so no recovery attempt.
+
+                InterruptStatus.noteAndClearInterrupt(
+                    "writeAtOffset in ClosedChannelException",
+                    threadsInPageIO,
+                    hashCode());
+
+                // stealthMode == true, so this will throw
+                // InterruptDetectedException
+                awaitRestoreChannel(e, stealthMode);
+            }
         }
     }