You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org> on 2006/08/18 15:37:14 UTC

[jira] Commented: (DERBY-801) Allow parallel access to data files.

    [ http://issues.apache.org/jira/browse/DERBY-801?page=comments#action_12429015 ] 
            
Knut Anders Hatlen commented on DERBY-801:
------------------------------------------

Hi Anders,

Thank you for working on this issue! I have looked at your patch, and
my impression is that your approach looks good and complete. A couple
of comments and questions follows.

First to your questions in JIRA and in the comments:

1) I don't think casting dataFile to RandomAccessFile is such an ugly
   hack. You check that it is an instance of RandomAccessFile before
   you cast it, and fall back to the old behaviour if it is not. Seems
   perfectly safe to me.

   If you're really determined to clean up the interfaces, I would
   suggest that the implementations of StorageRandomAccessFile
   contained a RandomAccessFile instance instead of extending the
   class. Then the StorageRandomAccessFile interface could be changed
   to provide a FileChannel-like abstraction. In that case, there
   would be only one implementation of RAFContainer (and it would look
   very much like your RAFContainer4), but the implementations of
   StorageRandomAccessFile would have to differ between different VMs
   (that is, on jvm>=1.4 they would use FileChannel under the hood, on
   jvm 1.3 they would pretend that they did, but actually serialize
   reads and writes).

2) In a comment to a try/finally statement where the finally clause
   only contains debug code, you write that you hope the compiler will
   optimize it away. I think this is a reasonable expectation.

Then to my own comments:

3) I'm not sure the access to needsSync is thread safe even though you
   have declared it as volatile. All accesses to it in RAFContainer
   are synchronized on the RAFContainer instance, but the one in
   RAFContainer4 is not. I think this can lead to race conditions,
   such as:

     Thread 1 is invoking RAFContainer.clean() which calls
     writeRAFHeader() and clearDirty() in a synchronized block.

     At the same time, thread 2 is executing RAFContainer4.writePage()
     which contains the assignment "needsSync = true" without any
     synchronization.

   It is possible that needsSync is assigned to true after thread 1
   invokes writeRAFHeader() but before it invokes clearDirty(). Since
   clearDirty() sets needsSync to false, the second thread's request
   for syncing disappears.

   If the assignment were changed to
      synchronized (this) { needsSync = true; }
   I think it would be thread safe (there would be no way to change
   the value of needsSync between writeRAFHeader() and clearDirty()).

4) Is the synchronization in RAFContainer4.updatePageArray() needed?
   There is no synchronization in RAFContainer.updatePageArray(). To
   me, it seems like RAFContainer.updatePageArray() could be used
   directly (but you'll have to check whether the database is
   encrypted and allocate a new encryption buffer if it is). Is there
   a particular reason why RAFContainer4 needs its own
   updatePageArray()?

5) Should there have been iosInProgress++ and iosInProgress-- in the
   "if (syncPage)" part of RAFContainer4.writePage() too?

6) The two new java files added by your patch use a mix of tabs and
   spaces as indentation character. It would be better if they
   consistently used spaces. And extra bonus if none of the lines
   exceed 80 characters. :)

7) Apache has decided that we should use another licence notice in the
   file headers. You could just copy the new text from another file
   (you might need to run 'svn up' since this was changed recently).

8) The RAFContainer4 class and the RAFContainerFactory class could be
   package protected, not public.

9) In RAFContainer4, ourChannel and iosInProgress could be private.

10) In RAFContainerFactory, the comment to rafContainerConstructor
    says "Immutable, initialized by constructor." Since this is the
    case, I would prefer that it was made explicit in the code by
    declaring it as final. And it could be made private (and maybe
    static).

11) RAFContainer.padFile() was changed from private to protected, but
    that is not needed since padFile() isn't used in RAFContainer4.

12) Javadoc for RAFContainer4.writeFully() says "readFully" instead of
    "writeFully".

13) The javadocs for the classes look great! If you could add <p>
    between the paragraphs, they would look great after they have been
    transformed into HTML files as well. :)

14) RAFContainerFactory.newRAFContainer() could be written more
    compactly if "return new RAFContainer(factory);" were moved out of
    the catch clause and the else clause. :)

With the exception of 3 and 4, these comments are really minor
nits. (I was in a picky mood today, sorry! ;) ). You have done a great
job! Thank you very much!

> Allow parallel access to data files.
> ------------------------------------
>
>                 Key: DERBY-801
>                 URL: http://issues.apache.org/jira/browse/DERBY-801
>             Project: Derby
>          Issue Type: Improvement
>          Components: Performance, Store
>    Affects Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.1.1.1, 10.1.1.2, 10.1.2.0, 10.1.2.1
>         Environment: Any
>            Reporter: Øystein Grøvlen
>         Assigned To: Anders Morken
>         Attachments: DERBY-801-v2.patch, NIO-RAFContainer-v1.patch
>
>
> Derby currently serializes accesses to a data file.  For example, the
> implementation of RAFContainer.readPage is as follows:
>     synchronized (this) {  // 'this' is a FileContainer, i.e. a file object
>         fileData.seek(pageOffset);  // fileData is a RandomAccessFile
>         fileData.readFully(pageData, 0, pageSize);
>     }
> I have experiemented with a patch where I have introduced several file
> descriptors (RandomAccessFile objects) per RAFContainer.  These are
> used for reading.  The principle is that when all readers are busy, a
> readPage request will create a new reader.  (There is a maximum number
> of readers.)  With this patch, throughput was improved by 50% on
> linux.  For more discussion on this, see
> http://www.nabble.com/Derby-I-O-issues-during-checkpointing-t473523.html
> The challenge with the suggested approach is to make a mechanism to
> limit the number of open file descpriptors.  Mike Matrigali has
> suggested to use the existing CacheManager infrastructure for this
> purpose.  For a discussion on that, see:
> http://www.nabble.com/new-uses-for-basic-services-cache---looking-for-advice-t756863.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira