You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org> on 2005/11/30 17:03:30 UTC

[jira] Created: (DERBY-733) Starvation in RAFContainer.readPage()

Starvation in RAFContainer.readPage()
-------------------------------------

         Key: DERBY-733
         URL: http://issues.apache.org/jira/browse/DERBY-733
     Project: Derby
        Type: Bug
  Components: Performance, Store  
    Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2    
 Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
    Reporter: Knut Anders Hatlen
 Assigned to: Knut Anders Hatlen 


When Derby is completely disk bound, threads might be starved in
RAFContainer.readPage(). This is a real problem when multiple clients
are repeatedly accessing one or a small number of large tables. In
cases like this, I have observed very high maximum response times
(several minutes in the worst cases) on simple transactions. The
average response time is not affected by this.

The starvation is caused by a synchronized block in
RAFContainer.readPage():

  synchronized (this) {
      fileData.seek(pageOffset);
      fileData.readFully(pageData, 0, pageSize);
  }

If many threads want to read pages from the same file, there will be a
long queue of threads waiting for this monitor. Since the Java
specification does not guarantee that threads waiting for monitors are
treated fairly, some threads might have to wait for a long time before
they get the monitor. (Usually, a couple of threads get full throughput
while the others have to wait.)


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

I am looking at committing this.

Knut Anders Hatlen (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]
> 
> Knut Anders Hatlen updated DERBY-733:
> -------------------------------------
> 
>     Attachment: DERBY-733.diff
> 
> Mike, I agree that a pool of open file descriptors is a good idea, but
> you will run into the same problem with highly unstable response times
> when the number of threads accessing the same file exceeds the number
> of file descriptors. I think we should use ReentrantLock for Java 1.5
> and higher, since the introduction of this class has allowed the
> implementers of JVMs to prioritize throughput over fairness in the
> handling of monitors. We should file a separate enhancement request
> for the file descriptor pool.
> 
> I have attached a patch which invokes ReentrantLock.lock() and
> unlock() when reading in a page from disk. I did not build my own
> ReentrantLock replacement, as I said I would. Instead, I have used
> reflection to enable this feature if the JVM supports it. This seemed
> like an easier approach, and I also discovered that the handling of
> threads waiting for monitors had changed between 1.4 and 1.5 and that
> this issue was not so serious on 1.4.
> 
> The maximum response time was drastically reduced in the disk-bound
> case. Derbyall ran successfully on both Sun JVM 1.4.2 and 1.5.0. I
> have also tested the performance, and I could not see any change in
> throughput or CPU usage. (The performance test was run with a very
> small page cache and with a database that was many times bigger than
> the page cache, but smaller than the file system cache. This way,
> Derby called readPage() very often, but it was CPU-bound since the
> requested page always was in the file system cache.)
> 
> Could someone please review this patch?
> 
> % svn stat
> M      java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733.diff
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Mike Matrigali <mi...@sbcglobal.net> writes:

> I have reviewed and committed the following patch.  I have the following
> comments which weren't enough to hold up this patch:
>
> 1) Will you at some point be contributing the performance tests which
>     you used to verify this fix, and the original test which showed the
>     problem?  I know this is a hard problem, just wondering if you are
>     working on solving it.

I don't think I can contribute the test that found the problem, as it
is part of a larger test framework. I can however make a small test
that reproduces the problem and attach it to the JIRA issue.

> 2) minor nit - could you try to keep lines under 80 chars.

Ok, sorry about that.

> 3) I was a little surprised that catch blocks did not do anything
>     on a released system.  Then thinking about it, it seemed that ok
>     since the added code is not actually using the lock mechanism to
>     protect anything, but instead just to schedule the enclosed work
>     more fairly.  In fact in jdk1.4 systems the code has to work
>     without the lock calls at all.  I have not idea what kind of
>     exceptions might happen.

The checked exceptions that Method.invoke() might throw are
IllegalAccessException, IllegalArgumentException and
InvocationTargetException. lock() does not throw exceptions, but
unlock() throws IllegalMonitorStateException if the current thread is
not the owner of the lock. My reasoning was that this could only
happen if someone put a bogus ReentrantLock class in their classpath,
which is highly unlikely, and in that case we could just fall back to
pre-1.5 behaviour.

>     The only case that really concerned me
>     is if lock worked but unlock failed - you may want to think about
>     that case and see if you should raise an exception.

The only case in which unlock() fails, is when lock() has failed or
hasn't been invoked. We can easily address this issue by adding
hasJava5FairLocks = false to the catch blocks and fall back to the old
behaviour.

>     May be useful to add comments why no exception handling is necessary.

I will do that.

Thanks for reviewing and committing,

Knut Anders

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

I have reviewed and committed the following patch.  I have the following
comments which weren't enough to hold up this patch:

1) Will you at some point be contributing the performance tests which
    you used to verify this fix, and the original test which showed the
    problem?  I know this is a hard problem, just wondering if you are
    working on solving it.
2) minor nit - could you try to keep lines under 80 chars.
3) I was a little surprised that catch blocks did not do anything
    on a released system.  Then thinking about it, it seemed that ok
    since the added code is not actually using the lock mechanism to
    protect anything, but instead just to schedule the enclosed work
    more fairly.  In fact in jdk1.4 systems the code has to work
    without the lock calls at all.  I have not idea what kind of
    exceptions might happen.  The only case that really concerned me
    is if lock worked but unlock failed - you may want to think about
    that case and see if you should raise an exception.

    May be useful to add comments why no exception handling is necessary.

Knut Anders Hatlen (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]
> 
> Knut Anders Hatlen updated DERBY-733:
> -------------------------------------
> 
>     Attachment: DERBY-733.diff
> 
> Mike, I agree that a pool of open file descriptors is a good idea, but
> you will run into the same problem with highly unstable response times
> when the number of threads accessing the same file exceeds the number
> of file descriptors. I think we should use ReentrantLock for Java 1.5
> and higher, since the introduction of this class has allowed the
> implementers of JVMs to prioritize throughput over fairness in the
> handling of monitors. We should file a separate enhancement request
> for the file descriptor pool.
> 
> I have attached a patch which invokes ReentrantLock.lock() and
> unlock() when reading in a page from disk. I did not build my own
> ReentrantLock replacement, as I said I would. Instead, I have used
> reflection to enable this feature if the JVM supports it. This seemed
> like an easier approach, and I also discovered that the handling of
> threads waiting for monitors had changed between 1.4 and 1.5 and that
> this issue was not so serious on 1.4.
> 
> The maximum response time was drastically reduced in the disk-bound
> case. Derbyall ran successfully on both Sun JVM 1.4.2 and 1.5.0. I
> have also tested the performance, and I could not see any change in
> throughput or CPU usage. (The performance test was run with a very
> small page cache and with a database that was many times bigger than
> the page cache, but smaller than the file system cache. This way,
> Derby called readPage() very often, but it was CPU-bound since the
> requested page always was in the file system cache.)
> 
> Could someone please review this patch?
> 
> % svn stat
> M      java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733.diff
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

[jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]

Knut Anders Hatlen updated DERBY-733:
-------------------------------------

    Attachment: DERBY-733.diff

Mike, I agree that a pool of open file descriptors is a good idea, but
you will run into the same problem with highly unstable response times
when the number of threads accessing the same file exceeds the number
of file descriptors. I think we should use ReentrantLock for Java 1.5
and higher, since the introduction of this class has allowed the
implementers of JVMs to prioritize throughput over fairness in the
handling of monitors. We should file a separate enhancement request
for the file descriptor pool.

I have attached a patch which invokes ReentrantLock.lock() and
unlock() when reading in a page from disk. I did not build my own
ReentrantLock replacement, as I said I would. Instead, I have used
reflection to enable this feature if the JVM supports it. This seemed
like an easier approach, and I also discovered that the handling of
threads waiting for monitors had changed between 1.4 and 1.5 and that
this issue was not so serious on 1.4.

The maximum response time was drastically reduced in the disk-bound
case. Derbyall ran successfully on both Sun JVM 1.4.2 and 1.5.0. I
have also tested the performance, and I could not see any change in
throughput or CPU usage. (The performance test was run with a very
small page cache and with a database that was many times bigger than
the page cache, but smaller than the file system cache. This way,
Derby called readPage() very often, but it was CPU-bound since the
requested page always was in the file system cache.)

Could someone please review this patch?

% svn stat
M      java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733.diff
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Closed: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]
     
Knut Anders Hatlen closed DERBY-733:
------------------------------------

    Fix Version: 10.2.0.0
     Resolution: Fixed

Fixed in revision 357275. Thanks for taking the time to review and commit, Mike.

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>      Fix For: 10.2.0.0
>  Attachments: DERBY-733-more-exception-handling.diff, DERBY-733.diff, Insert.java, Select.java
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Daniel John Debrunner <dj...@debrunners.com> writes:

> Mike Matrigali wrote:
>> You have raised some issues about this patch.  In the apache
>> commit/review model should I be raising a vote on the patch.  I
>> think this is basically do you feel strongly enough to vote -1
>> on such a vote.
>
> Any patch implictly has a vote, there is no need to raise such a thing.
> If I wanted to vote -1 I would do so.
>
>> I don't think Knut plans on building a separate module for this
>> locking stuff, at least not until it is used in more than one place.
>> He should comment.

That's correct, I don't plan on doing that. Not right now, at
least. Of course, if someone voted -1 the plans would probably
change. ;)

>> I am ok with the patch, and would go foward with his subsequent
>> patch recently submitted as it addressed some of my concerns (I have
>> not had a chance yet to review it).  I agree the separate module
>> approach is
>> better, but that is not what was submitted.  I believe I would
>> not commit a proliferation of the same kinds of changes in multiple
>> files.
>> 
>> I am hoping that this patch leads to more work in this area, identifying
>> the next bottleneck and next change and it may become clearer what major
>> changes need to happen.
>
> Agreed, though it does concern me a little that there is an existing
> mechanism for ensuring single threaded access to an object with queuing.
>  It seems that one of the reasons it was not used was that someone had a
> question about if it could be used, but that question was not rasied on
> the list until after the patch was submitted and committed.
>
> I don't want to see a trend where people add code that replicates
> existing internal functionality because they don't understand the
> existing functionality and never ask about it on the list.

I admit that you are correct when you are saying that I don't
understand the functionality of the lock manager. However, saying that
I never asked about it on the list is not entirely true. I did post a
description of the problem on the list. And I did post a description
of what functionality was needed to fix the problem and how I planned
to implement it. Using existing functionality is of course a much
better solution than the ugly hack I came up with, but I never hid my
intentions from the list. If someone knew that Derby already had that
kind of functionality, nothing prevented them from letting me know.

-- 
Knut Anders

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Daniel John Debrunner <dj...@debrunners.com>.

Mike Matrigali wrote:
> You have raised some issues about this patch.  In the apache
> commit/review model should I be raising a vote on the patch.  I
> think this is basically do you feel strongly enough to vote -1
> on such a vote.

Any patch implictly has a vote, there is no need to raise such a thing.
If I wanted to vote -1 I would do so.

> I don't think Knut plans on building a separate module for this
> locking stuff, at least not until it is used in more than one place.
> He should comment.
> 
> I am ok with the patch, and would go foward with his subsequent
> patch recently submitted as it addressed some of my concerns (I have
> not had a chance yet to review it).  I agree the separate module
> approach is
> better, but that is not what was submitted.  I believe I would
> not commit a proliferation of the same kinds of changes in multiple
> files.
> 
> I am hoping that this patch leads to more work in this area, identifying
> the next bottleneck and next change and it may become clearer what major
> changes need to happen.

Agreed, though it does concern me a little that there is an existing
mechanism for ensuring single threaded access to an object with queuing.
 It seems that one of the reasons it was not used was that someone had a
question about if it could be used, but that question was not rasied on
the list until after the patch was submitted and committed.

I don't want to see a trend where people add code that replicates
existing internal functionality because they don't understand the
existing functionality and never ask about it on the list.

Dan.

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

You have raised some issues about this patch.  In the apache 
commit/review model should I be raising a vote on the patch.  I
think this is basically do you feel strongly enough to vote -1
on such a vote.

I don't think Knut plans on building a separate module for this
locking stuff, at least not until it is used in more than one place.
He should comment.

I am ok with the patch, and would go foward with his subsequent
patch recently submitted as it addressed some of my concerns (I have
not had a chance yet to review it).  I agree the separate module approach is
better, but that is not what was submitted.  I believe I would
not commit a proliferation of the same kinds of changes in multiple
files.

I am hoping that this patch leads to more work in this area, identifying
the next bottleneck and next change and it may become clearer what major
changes need to happen.

Daniel John Debrunner (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360463 ] 
> 
> Daniel John Debrunner commented on DERBY-733:
> ---------------------------------------------
> 
> The issue I have with this current patch is that is is localized to one use in RAFContainer, when reading files.
> Most likely there are other locations where such a facility would be useful, especially on the write for RAFContainer.
> Are we going to have similar if (java5) statements everywhere. The module api already supports loading different
> code for different environments, I think that this functionality could be added to the LockManager  or  maybe a separate
> module. This would be an improvement, but I think it would be a mistake to have similar code to this patch in many
> areas of Derby. And yes, maybe we could work on improving Latch performance along these lines.
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733.diff
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "ST" == Suresh Thalamati <su...@gmail.com> writes:

    ST> This  might  be  obvious, thought  I  would  mention  it any  way.  My
    ST> understanding is  one can  not just enable  "RWD" (direct io)  for the
    ST> checkpoint. It  has to  be enabled  for all the  writes from  the page
    ST> cache, otherwise  a file  sync is required  before doing  "rwd" writes
    ST> because I  am not sure If  a file is opened  in "rw" mode  and then in
    ST> "rws" mode , writes  to first open will also get synced  to the disk ,
    ST> when file is opened in "rwd" mode , I doubt that.


    ST> If  files are  opened  in direct  io  mode always  ,  then page  cache
    ST> cleaning can possible  get slow and also user query  request for a new
    ST> page in buffer pool can become slow  if a cache is full and a page has
    ST> to be thrown out to get a  free page. Another thing to note is buffer
    ST> cleaning is done  on Rawstore daemon thread, which  is overloaded with
    ST> some post commit  work also , so page cache may  not get cleaned often
    ST> in some cases.

I think the solution to this is to use multiple threads for writing
pages to disk.  In order not to slow down user threads, one can copy
the page to another buffer and let it be written asynchronusly by
another thread using direct i/o (emulates a file system buffer).

-- 
Øystein

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Francois Orsini <fr...@gmail.com>.

Agreed. Am not sure we can wait forever for Java to provide Asynchronous
I/O's - was supposed to be part of NIO.2 and it has now been postponed to*
Java* SE 7.0 - Coming up with Derby's own Async I/O layer (w/ worker
threads) seems to make more sense all of a suden...

On 12/19/05, Øystein Grøvlen <Oy...@sun.com> wrote:
>
> >>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>
>     MM> you are right, I'll have to think about this some more.  Until
> java
>     MM> gets async, guaranteed sync'd to disk writes I think we should
> continue
>     MM> to use the current method for user initiated writes.
>
> Or we can implement the async writes ourselves by using designated
> write threads.
>
> --
> Øystein
>
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> you are right, I'll have to think about this some more.  Until java
    MM> gets async, guaranteed sync'd to disk writes I think we should continue
    MM> to use the current method for user initiated writes.

Or we can implement the async writes ourselves by using designated
write threads.

-- 
Øystein

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

you are right, I'll have to think about this some more.  Until java
gets async, guaranteed sync'd to disk writes I think we should continue
to use the current method for user initiated writes.

Suresh Thalamati wrote:
> This might be obvious, thought I would mention it any way. My 
> understanding is one can not just enable "RWD" (direct io) for the 
> checkpoint. It has to be enabled for all the writes from the page cache, 
> otherwise a file sync is required before doing "rwd" writes because I am 
> not sure If  a file is opened in "rw" mode  and then in "rws" mode , 
> writes to first open will also get synced to the disk , when file is 
> opened in "rwd" mode , I doubt that.
> 
> If files are opened in direct io mode always , then  page cache cleaning 
> can possible get slow and also user query request for a new page in 
> buffer pool can become slow if a cache is full and a page has to be 
> thrown out to get a free page.  Another thing to note is buffer cleaning 
> is done on Rawstore  daemon thread, which  is overloaded with some post 
> commit work also , so page cache may not get cleaned often in some cases.
> 
> 
> Thanks
> -suresht
> 
> 
> Mike Matrigali wrote:
> 
>> excellent, I look forward to your work on concurrent I/O.  I am likely
>> to not be on the list much for the next 2 weeks, so won't be able to
>> help much.  In thinking about this issue I was hoping that somehow
>> the current container cache could be enhanced to support more than
>> one open container per container.  Then one would automatically get
>> control over the open file resource across all containers, by setting
>> the currently supported "max" on the container pool.
>>
>> The challenge is that this would be a new concept for the basic services
>> cache implementation.  What we want is a cache that supports multiple
>> objects with the same key, and that returns an available one if another
>> one is "busy".  Also returns a newly opened one, if all are busy.  I
>> am going to start a thread on this, to see if any other help is
>> available.  If possible I like this approach better than having a 
>> queue of open files per container where it hard to control the growth 
>> of one queue vs. the growth in another.
>>
>> On the checkpoint issue, I would not have a problem with changes to the
>> current mechanism to do "rwd" type sync I/O rather than sync at end 
>> (but we will have to support both until we don't have to support older 
>> versions of JVM's).  I believe this is as close
>> to "direct i/o" as we can get from java - if you mean something 
>> different here let me know.  The benefit is that I believe it will fix
>> the checkpoint flooding the I/O system problem.  The downside is that
>> it will cause total number of I/O's to increase in cases where the
>> derby block size is smaller than the filesystem/disk blocksize -- 
>> assuming the OS currently converts our flood of multiple async writes 
>> to the same file to a smaller number of bigger I/O's.  I think this 
>> trade off is fine for checkpoints.  If checkpoint efficiency is an 
>> issue, there are a number of other ways to address it in the future.
>>
>> Øystein Grøvlen wrote:
>>
>>>>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>>>
>>>
>>>
>>>
>>>     MM> user thread initiated read
>>>     MM>      o should  be high priority and  should be "fair"  with 
>>> other user
>>>     MM>        initiated reads.
>>>
>>>     MM>      o These happen anytime a read of a row causes a cache miss.
>>>     MM>      o Currently only one I/O operation to a file can happen 
>>> at a time,
>>>     MM>        could be big problem for some types of multi-threaded,
>>>     MM>        highly concurrent low number of table apps.  I think
>>>     MM>        the path here should be to increase the number of
>>>     MM>        concurrent I/O's allowed to be outstanding by allowing
>>>     MM>        each thread to have 1 (assuming sufficient open file
>>>     MM>        resources).  100 outstanding I/O's to a single file may
>>>     MM>        be overkill, but in java we can't know that the file is
>>>     MM>        not actually 100 disks underneath.  The number of I/O's
>>>     MM>        should grow as the actual application load increases,
>>>     MM>        note I still think max I/O's should be tied to number
>>>     MM>        of user threads, plus maybe a small number for
>>>     MM>        background processing.
>>>
>>> There was an interesting paper at the last VLDB conference that
>>> discussed the virtue of having many outstanding I/O requests:
>>>     http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
>>>     http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)
>>>
>>> The basic message is that many outstanding requests are good.  The
>>> SCSI controller they used in their study was able to handle 32
>>> concurrent requests.  One reason database systems have been
>>> conservative with respect to outstanding requests is that they want to
>>> control the priority of the I/O requests.  We would like user thread
>>> initiated requests to have priority over checkpoint initiated writes.
>>> (The authors suggest building priorities into the file system to solve
>>> this.)
>>>
>>> I plan to start working on a patch for allowing more concurrency
>>> between readers within a few weeks.  The main challenge is to find the
>>> best way to organize the open file descriptors (reuse, limit the max.
>>> number etc.)  I will file a JIRA for this.
>>>
>>> I also think we should consider mechanisms for read ahead.
>>>
>>>     MM> user thread initiated write
>>>     MM>       o same issues as user initiated read.
>>>     MM>       o happens way less than read, as it should only happen 
>>> on a cache
>>>     MM>         miss that can't find a non-dirty page in the cache.  
>>> background
>>>     MM>         cache cleaner  should be  keeping this from  
>>> happening, though
>>>     MM>         apps that only do updates and cause cache hits are 
>>> worst case.
>>>
>>>
>>>     MM> checkpoint initiated write:
>>>     MM>       o sometimes too many checkpoints happen in too short a 
>>> time.
>>>     MM>       o needs an improved scheduling algorithm, currently 
>>> just defaults
>>>     MM>         to N number of bytes to the log file no matter what 
>>> the speed of
>>>     MM>         log writes are.
>>>     MM>       o currently may flood the I/O system causing user 
>>> reads/writes to
>>>     MM>         stall - on  some OS/JVM's this stall is  amazing like 
>>> ten's of
>>>     MM>         seconds.
>>>     MM>       o  It is not  important that  checkpoints run  fast, 
>>> it  is more
>>>     MM>         important  that it  prodede methodically  to  
>>> conclusion while
>>>     MM>         causing a little  interuption to "real" work by  user 
>>> threads.     MM>         Various approaches to this were discussed, 
>>> but no patches yet.
>>>
>>> For the scheduling of checkpoints, I was hoping Raymond would come up
>>> with something.  Raymond are you still with us?
>>>
>>> I have discussed our I/O architecture with Solaris engineers, and our
>>> approach of doing buffered writes followed by a fsync, I was told was
>>> the worst approach on Solaris.  They recommended using direct I/O.  I
>>> guess there will be situations were single-threaded direct I/O for
>>> checkpointing will give too low throughput.  In that case, we could
>>> consider a pool of writers.  The challenge would then be how to give
>>> priority to user-initiated requests over multi-threaded checkpoint
>>> writes as discussed above.
>>>
>>
>>
> 
> 
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Suresh Thalamati <su...@gmail.com>.

This might be obvious, thought I would mention it any way. My 
understanding is one can not just enable "RWD" (direct io) for the 
checkpoint. It has to be enabled for all the writes from the page 
cache, otherwise a file sync is required before doing "rwd" writes 
because I am not sure If  a file is opened in "rw" mode  and then in 
"rws" mode , writes to first open will also get synced to the disk , 
when file is opened in "rwd" mode , I doubt that.

If files are opened in direct io mode always , then  page cache 
cleaning can possible get slow and also user query request for a new 
page in buffer pool can become slow if a cache is full and a page has 
to be thrown out to get a free page.  Another thing to note is buffer 
cleaning is done on Rawstore  daemon thread, which  is overloaded with 
some post commit work also , so page cache may not get cleaned often 
in some cases.


Thanks
-suresht


Mike Matrigali wrote:
> excellent, I look forward to your work on concurrent I/O.  I am likely
> to not be on the list much for the next 2 weeks, so won't be able to
> help much.  In thinking about this issue I was hoping that somehow
> the current container cache could be enhanced to support more than
> one open container per container.  Then one would automatically get
> control over the open file resource across all containers, by setting
> the currently supported "max" on the container pool.
> 
> The challenge is that this would be a new concept for the basic services
> cache implementation.  What we want is a cache that supports multiple
> objects with the same key, and that returns an available one if another
> one is "busy".  Also returns a newly opened one, if all are busy.  I
> am going to start a thread on this, to see if any other help is
> available.  If possible I like this approach better than having a queue 
> of open files per container where it hard to control the growth of one 
> queue vs. the growth in another.
> 
> On the checkpoint issue, I would not have a problem with changes to the
> current mechanism to do "rwd" type sync I/O rather than sync at end (but 
> we will have to support both until we don't have to support older 
> versions of JVM's).  I believe this is as close
> to "direct i/o" as we can get from java - if you mean something 
> different here let me know.  The benefit is that I believe it will fix
> the checkpoint flooding the I/O system problem.  The downside is that
> it will cause total number of I/O's to increase in cases where the
> derby block size is smaller than the filesystem/disk blocksize -- 
> assuming the OS currently converts our flood of multiple async writes to 
> the same file to a smaller number of bigger I/O's.  I think this trade 
> off is fine for checkpoints.  If checkpoint efficiency is an issue, 
> there are a number of other ways to address it in the future.
> 
> Øystein Grøvlen wrote:
> 
>>>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
>>
>>
>>
>>     MM> user thread initiated read
>>     MM>      o should  be high priority and  should be "fair"  with 
>> other user
>>     MM>        initiated reads.
>>
>>     MM>      o These happen anytime a read of a row causes a cache miss.
>>     MM>      o Currently only one I/O operation to a file can happen 
>> at a time,
>>     MM>        could be big problem for some types of multi-threaded,
>>     MM>        highly concurrent low number of table apps.  I think
>>     MM>        the path here should be to increase the number of
>>     MM>        concurrent I/O's allowed to be outstanding by allowing
>>     MM>        each thread to have 1 (assuming sufficient open file
>>     MM>        resources).  100 outstanding I/O's to a single file may
>>     MM>        be overkill, but in java we can't know that the file is
>>     MM>        not actually 100 disks underneath.  The number of I/O's
>>     MM>        should grow as the actual application load increases,
>>     MM>        note I still think max I/O's should be tied to number
>>     MM>        of user threads, plus maybe a small number for
>>     MM>        background processing.
>>
>> There was an interesting paper at the last VLDB conference that
>> discussed the virtue of having many outstanding I/O requests:
>>     http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
>>     http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)
>>
>> The basic message is that many outstanding requests are good.  The
>> SCSI controller they used in their study was able to handle 32
>> concurrent requests.  One reason database systems have been
>> conservative with respect to outstanding requests is that they want to
>> control the priority of the I/O requests.  We would like user thread
>> initiated requests to have priority over checkpoint initiated writes.
>> (The authors suggest building priorities into the file system to solve
>> this.)
>>
>> I plan to start working on a patch for allowing more concurrency
>> between readers within a few weeks.  The main challenge is to find the
>> best way to organize the open file descriptors (reuse, limit the max.
>> number etc.)  I will file a JIRA for this.
>>
>> I also think we should consider mechanisms for read ahead.
>>
>>     MM> user thread initiated write
>>     MM>       o same issues as user initiated read.
>>     MM>       o happens way less than read, as it should only happen 
>> on a cache
>>     MM>         miss that can't find a non-dirty page in the cache.  
>> background
>>     MM>         cache cleaner  should be  keeping this from  
>> happening, though
>>     MM>         apps that only do updates and cause cache hits are 
>> worst case.
>>
>>
>>     MM> checkpoint initiated write:
>>     MM>       o sometimes too many checkpoints happen in too short a 
>> time.
>>     MM>       o needs an improved scheduling algorithm, currently just 
>> defaults
>>     MM>         to N number of bytes to the log file no matter what 
>> the speed of
>>     MM>         log writes are.
>>     MM>       o currently may flood the I/O system causing user 
>> reads/writes to
>>     MM>         stall - on  some OS/JVM's this stall is  amazing like 
>> ten's of
>>     MM>         seconds.
>>     MM>       o  It is not  important that  checkpoints run  fast, it  
>> is more
>>     MM>         important  that it  prodede methodically  to  
>> conclusion while
>>     MM>         causing a little  interuption to "real" work by  user 
>> threads.     MM>         Various approaches to this were discussed, 
>> but no patches yet.
>>
>> For the scheduling of checkpoints, I was hoping Raymond would come up
>> with something.  Raymond are you still with us?
>>
>> I have discussed our I/O architecture with Solaris engineers, and our
>> approach of doing buffered writes followed by a fsync, I was told was
>> the worst approach on Solaris.  They recommended using direct I/O.  I
>> guess there will be situations were single-threaded direct I/O for
>> checkpointing will give too low throughput.  In that case, we could
>> consider a pool of writers.  The challenge would then be how to give
>> priority to user-initiated requests over multi-threaded checkpoint
>> writes as discussed above.
>>
> 
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> excellent, I look forward to your work on concurrent I/O.  I am likely
    MM> to not be on the list much for the next 2 weeks, so won't be able to
    MM> help much.  In thinking about this issue I was hoping that somehow
    MM> the current container cache could be enhanced to support more than
    MM> one open container per container.  Then one would automatically get
    MM> control over the open file resource across all containers, by setting
    MM> the currently supported "max" on the container pool.

I have thought about using the current cache implementation for
caching open file descriptors.  However, I have not considered using
the container cache for this.  Will this work?  Do not some of the
code assume that there is only one object for each container?  (E.g,
truncation, backup).


    MM> The challenge is that this would be a new concept for the basic services
    MM> cache implementation.  What we want is a cache that supports multiple
    MM> objects with the same key, and that returns an available one if another
    MM> one is "busy".  Also returns a newly opened one, if all are busy.  I
    MM> am going to start a thread on this, to see if any other help is
    MM> available.  If possible  I like  this approach  better than  having a
    MM> queue of open files per container  where it hard to control the growth
    MM> of one queue vs. the growth in another.

I agree with you that a common cache would be better.  However, as you
point out, it is not straight-forward to use the current cache
implementation.  My first thought was that it would be too much work
to extend the current caching framework for this use, but I will think
a bit more about this, and follow up in your thread on this.


    MM> On the checkpoint issue, I would not have a problem with changes to the
    MM> current mechanism  to do "rwd" type  sync I/O rather than  sync at end
    MM> (but we will have to support both until we don't have to support older
    MM> versions of JVM's).  I believe this is as close
    MM> to  "direct i/o"  as we  can get  from java  - if  you  mean something
    MM> different here let me know.  

Yes, this was what I was thinking of.  Using direct i/o should give
better performance on reads.  However, some configurations may see a
performance drop since they depend on the file system buffer for good
performance.  However, such applications should be able to reclaim the
performance by increasing the page cache to not depend on file system
buffering.

    MM> The benefit is that I believe it will fix the checkpoint
    MM> flooding the I/O system problem.  The downside is that it will
    MM> cause total number of I/O's to increase in cases where the
    MM> derby block size is smaller than the filesystem/disk blocksize
    MM> -- assuming the OS currently converts our flood of multiple
    MM> async writes to the same file to a smaller number of bigger
    MM> I/O's.  

It will vary from file systems to file systems how this will be done.
It would be nice to do some experiments here.  Maybe someone could do
a study on direct i/o on different file systems.  (E.g., the
performance impact of different page sizes).

    MM> I think this trade off is fine for checkpoints. If
    MM> checkpoint efficiency is an issue, there are a number of other
    MM> ways to address it in the future.

I agree.  Multiple writer threads are one way to address this.

-- 
Øystein

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

excellent, I look forward to your work on concurrent I/O.  I am likely
to not be on the list much for the next 2 weeks, so won't be able to
help much.  In thinking about this issue I was hoping that somehow
the current container cache could be enhanced to support more than
one open container per container.  Then one would automatically get
control over the open file resource across all containers, by setting
the currently supported "max" on the container pool.

The challenge is that this would be a new concept for the basic services
cache implementation.  What we want is a cache that supports multiple
objects with the same key, and that returns an available one if another
one is "busy".  Also returns a newly opened one, if all are busy.  I
am going to start a thread on this, to see if any other help is
available.  If possible I like this approach better than having a queue 
of open files per container where it hard to control the growth of one 
queue vs. the growth in another.

On the checkpoint issue, I would not have a problem with changes to the
current mechanism to do "rwd" type sync I/O rather than sync at end (but 
we will have to support both until we don't have to support older 
versions of JVM's).  I believe this is as close
to "direct i/o" as we can get from java - if you mean something 
different here let me know.  The benefit is that I believe it will fix
the checkpoint flooding the I/O system problem.  The downside is that
it will cause total number of I/O's to increase in cases where the
derby block size is smaller than the filesystem/disk blocksize -- 
assuming the OS currently converts our flood of multiple async writes to 
the same file to a smaller number of bigger I/O's.  I think this trade 
off is fine for checkpoints.  If checkpoint efficiency is an issue, 
there are a number of other ways to address it in the future.

Øystein Grøvlen wrote:
>>>>>>"MM" == Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>     MM> user thread initiated read
>     MM>      o should  be high priority and  should be "fair"  with other user
>     MM>        initiated reads.
> 
>     MM>      o These happen anytime a read of a row causes a cache miss.
>     MM>      o Currently only one I/O operation to a file can happen at a time,
>     MM>        could be big problem for some types of multi-threaded,
>     MM>        highly concurrent low number of table apps.  I think
>     MM>        the path here should be to increase the number of
>     MM>        concurrent I/O's allowed to be outstanding by allowing
>     MM>        each thread to have 1 (assuming sufficient open file
>     MM>        resources).  100 outstanding I/O's to a single file may
>     MM>        be overkill, but in java we can't know that the file is
>     MM>        not actually 100 disks underneath.  The number of I/O's
>     MM>        should grow as the actual application load increases,
>     MM>        note I still think max I/O's should be tied to number
>     MM>        of user threads, plus maybe a small number for
>     MM>        background processing.
> 
> There was an interesting paper at the last VLDB conference that
> discussed the virtue of having many outstanding I/O requests:
>     http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
>     http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)
> 
> The basic message is that many outstanding requests are good.  The
> SCSI controller they used in their study was able to handle 32
> concurrent requests.  One reason database systems have been
> conservative with respect to outstanding requests is that they want to
> control the priority of the I/O requests.  We would like user thread
> initiated requests to have priority over checkpoint initiated writes.
> (The authors suggest building priorities into the file system to solve
> this.)
> 
> I plan to start working on a patch for allowing more concurrency
> between readers within a few weeks.  The main challenge is to find the
> best way to organize the open file descriptors (reuse, limit the max.
> number etc.)  I will file a JIRA for this.
> 
> I also think we should consider mechanisms for read ahead.
> 
>     MM> user thread initiated write
>     MM>       o same issues as user initiated read.
>     MM>       o happens way less than read, as it should only happen on a cache
>     MM>         miss that can't find a non-dirty page in the cache.  background
>     MM>         cache cleaner  should be  keeping this from  happening, though
>     MM>         apps that only do updates and cause cache hits are worst case.
> 
> 
>     MM> checkpoint initiated write:
>     MM>       o sometimes too many checkpoints happen in too short a time.
>     MM>       o needs an improved scheduling algorithm, currently just defaults
>     MM>         to N number of bytes to the log file no matter what the speed of
>     MM>         log writes are.
>     MM>       o currently may flood the I/O system causing user reads/writes to
>     MM>         stall - on  some OS/JVM's this stall is  amazing like ten's of
>     MM>         seconds.
>     MM>       o  It is not  important that  checkpoints run  fast, it  is more
>     MM>         important  that it  prodede methodically  to  conclusion while
>     MM>         causing a little  interuption to "real" work by  user threads. 
>     MM>         Various approaches to this were discussed, but no patches yet.
> 
> For the scheduling of checkpoints, I was hoping Raymond would come up
> with something.  Raymond are you still with us?
> 
> I have discussed our I/O architecture with Solaris engineers, and our
> approach of doing buffered writes followed by a fsync, I was told was
> the worst approach on Solaris.  They recommended using direct I/O.  I
> guess there will be situations were single-threaded direct I/O for
> checkpointing will give too low throughput.  In that case, we could
> consider a pool of writers.  The challenge would then be how to give
> priority to user-initiated requests over multi-threaded checkpoint
> writes as discussed above.
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

>>>>> "MM" == Mike Matrigali <mi...@sbcglobal.net> writes:

    MM> user thread initiated read
    MM>      o should  be high priority and  should be "fair"  with other user
    MM>        initiated reads.

    MM>      o These happen anytime a read of a row causes a cache miss.
    MM>      o Currently only one I/O operation to a file can happen at a time,
    MM>        could be big problem for some types of multi-threaded,
    MM>        highly concurrent low number of table apps.  I think
    MM>        the path here should be to increase the number of
    MM>        concurrent I/O's allowed to be outstanding by allowing
    MM>        each thread to have 1 (assuming sufficient open file
    MM>        resources).  100 outstanding I/O's to a single file may
    MM>        be overkill, but in java we can't know that the file is
    MM>        not actually 100 disks underneath.  The number of I/O's
    MM>        should grow as the actual application load increases,
    MM>        note I still think max I/O's should be tied to number
    MM>        of user threads, plus maybe a small number for
    MM>        background processing.

There was an interesting paper at the last VLDB conference that
discussed the virtue of having many outstanding I/O requests:
    http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf (paper)
    http://www.vldb2005.org/program/slides/wed/s1116-hall.pdf (slides)

The basic message is that many outstanding requests are good.  The
SCSI controller they used in their study was able to handle 32
concurrent requests.  One reason database systems have been
conservative with respect to outstanding requests is that they want to
control the priority of the I/O requests.  We would like user thread
initiated requests to have priority over checkpoint initiated writes.
(The authors suggest building priorities into the file system to solve
this.)

I plan to start working on a patch for allowing more concurrency
between readers within a few weeks.  The main challenge is to find the
best way to organize the open file descriptors (reuse, limit the max.
number etc.)  I will file a JIRA for this.

I also think we should consider mechanisms for read ahead.

    MM> user thread initiated write
    MM>       o same issues as user initiated read.
    MM>       o happens way less than read, as it should only happen on a cache
    MM>         miss that can't find a non-dirty page in the cache.  background
    MM>         cache cleaner  should be  keeping this from  happening, though
    MM>         apps that only do updates and cause cache hits are worst case.


    MM> checkpoint initiated write:
    MM>       o sometimes too many checkpoints happen in too short a time.
    MM>       o needs an improved scheduling algorithm, currently just defaults
    MM>         to N number of bytes to the log file no matter what the speed of
    MM>         log writes are.
    MM>       o currently may flood the I/O system causing user reads/writes to
    MM>         stall - on  some OS/JVM's this stall is  amazing like ten's of
    MM>         seconds.
    MM>       o  It is not  important that  checkpoints run  fast, it  is more
    MM>         important  that it  prodede methodically  to  conclusion while
    MM>         causing a little  interuption to "real" work by  user threads. 
    MM>         Various approaches to this were discussed, but no patches yet.

For the scheduling of checkpoints, I was hoping Raymond would come up
with something.  Raymond are you still with us?

I have discussed our I/O architecture with Solaris engineers, and our
approach of doing buffered writes followed by a fsync, I was told was
the worst approach on Solaris.  They recommended using direct I/O.  I
guess there will be situations were single-threaded direct I/O for
checkpointing will give too low throughput.  In that case, we could
consider a pool of writers.  The challenge would then be how to give
priority to user-initiated requests over multi-threaded checkpoint
writes as discussed above.

-- 
Øystein

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Originally I would have preferred a separate loadable module approach, 
but I sent Bryan down that path in the lock manager and it seemed like
a daunting task to get the new file to build under jdk1.5.
And now I realize that it may require a derby level vote to require 
jdk1.5 to build the system. At the time I didn't realize we already
had reflection based calling of jdk1.5 stuff in the code.

I looked at the reflection callse in SQLDecimal.java and this didn't 
seem that different.
There is similar code in SQLDecimal.java, do you think that that code 
should have been in a separate module or is that a different case?

I now realize that one can still build separate modules using the
derby module system to load the appropriate code, and go ahead and use 
reflection in the separate module to localize the "if (java5)" kind of 
code.  I hadn't considered this or the use of derby latches when 
reviewing the fix.  I am sure it is not clear from the public interface 
description that the derby locking manager guarantees "fair" scheduling,
though I know it does (there are actually some cases for low priority 
background threads that it would not do that - but that is another 
story).

This module and a few others could use some improvements based on
some observations on the list:
o current user thread i/o sometimes causes user threads to be starved
o current checkpoint sometimes causes user threads to be starved
o current I/O is single threaded per file (a file maps to a table or an 
index)

It is clear that the single point of contention in RAFContainer is a 
problem that is showing up on bigger, more concurrent machines.   From
the evidence that has been coming on the list I think the following
needs to somehow be solved:

user thread initiated read
     o should be high priority and should be "fair" with other user 
  initiated reads.
     o These happen anytime a read of a row causes a cache miss.
     o Currently only one I/O operation to a file can happen at a time,
       could be big problem for some types of multi-threaded, highly
       concurrent low number of table apps.  I think the path here
       should be to increase the number of concurrent I/O's allowed
       to be outstanding by allowing each thread to have 1 (assuming
       sufficient open file resources).  100 outstanding I/O's to a 
single file may be overkill, but in java we can't know that the file is 
not actually 100 disks underneath.  The number of I/O's should grow
as the actual application load increases, note I still think max I/O's
should be tied to number of user threads, plus maybe a small number for
background processing.

user thread initiated write
      o same issues as user initiated read.
      o happens way less than read, as it should only happen on a cache
        miss that can't find a non-dirty page in the cache.  background
        cache cleaner should be keeping this from happening, though apps 
that only do updates and cause cache hits are worst case.

checkpoint initiated write:
      o sometimes too many checkpoints happen in too short a time.
      o needs an improved scheduling algorithm, currently just defaults
        to N number of bytes to the log file no matter what the speed of
        log writes are.
      o currently may flood the I/O system causing user reads/writes to
        stall - on some OS/JVM's this stall is amazing like ten's of 
seconds.
      o It is not important that checkpoints run fast, it is more 
important that it prodede methodically to conclusion while causing a 
little interuption to "real" work by user threads.  Various approaches
to this were discussed, but no patches yet.


Daniel John Debrunner (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360463 ] 
> 
> Daniel John Debrunner commented on DERBY-733:
> ---------------------------------------------
> 
> The issue I have with this current patch is that is is localized to one use in RAFContainer, when reading files.
> Most likely there are other locations where such a facility would be useful, especially on the write for RAFContainer.
> Are we going to have similar if (java5) statements everywhere. The module api already supports loading different
> code for different environments, I think that this functionality could be added to the LockManager  or  maybe a separate
> module. This would be an improvement, but I think it would be a mistake to have similar code to this patch in many
> areas of Derby. And yes, maybe we could work on improving Latch performance along these lines.
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733.diff
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

> Daniel John Debrunner commented on DERBY-733:
> ---------------------------------------------
>
> The issue I have with this current patch is that is is localized to
> one use in RAFContainer, when reading files.  Most likely there are
> other locations where such a facility would be useful, especially on
> the write for RAFContainer. Are we going to have similar if (java5)
> statements everywhere. The module api already supports loading
> different code for different environments, I think that this
> functionality could be added to the LockManager or maybe a separate
> module. This would be an improvement, but I think it would be a
> mistake to have similar code to this patch in many areas of
> Derby. And yes, maybe we could work on improving Latch performance
> along these lines.

I agree that if we find other parts of the Derby code that needs this
facility, we should either extend an existing module or create a new
module that provides it. The LockManager is probably the right place
to put it. Since the community seems to be interested in rewriting
much of Derby's I/O system, we could do this as a part of the rewrite.

-- 
Knut Anders

[jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Daniel John Debrunner (JIRA)" <de...@db.apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360463 ] 

Daniel John Debrunner commented on DERBY-733:
---------------------------------------------

The issue I have with this current patch is that is is localized to one use in RAFContainer, when reading files.
Most likely there are other locations where such a facility would be useful, especially on the write for RAFContainer.
Are we going to have similar if (java5) statements everywhere. The module api already supports loading different
code for different environments, I think that this functionality could be added to the LockManager  or  maybe a separate
module. This would be an improvement, but I think it would be a mistake to have similar code to this patch in many
areas of Derby. And yes, maybe we could work on improving Latch performance along these lines.

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733.diff
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Mike Matrigali (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]

Mike Matrigali updated DERBY-733:
---------------------------------

    type: Improvement  (was: Bug)

I agree something should be done to address this issue.  I liked the proposed solution of enabling multiple
open files on a single container, with some mechanism to tie these open files into a single open file cache
so that the resource is not unlimited.  

Marking it as an improvement as the current code works, it is just not optimal.

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen

>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-733:
--------------------------------

    Derby Categories: [Performance]

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>                 Key: DERBY-733
>                 URL: https://issues.apache.org/jira/browse/DERBY-733
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.1.2.1, 10.1.3.1, 10.2.1.6
>         Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.2.1.6
>
>         Attachments: DERBY-733-more-exception-handling.diff, DERBY-733.diff, Insert.java, Select.java
>
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

have reviewed/committed this patch as svn
  357275.
Knut Anders Hatlen (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]
> 
> Knut Anders Hatlen updated DERBY-733:
> -------------------------------------
> 
>     Attachment: DERBY-733-more-exception-handling.diff
> 
> Attached patch (DERBY-733-more-exception-handling.diff) that addresses
> Mike's concerns for exception handling. If something goes wrong when
> locking, Derby will now fall back to the old behaviour.
> 
> Derbyall ran without failures.
> 
> % svn stat -q
> M      java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733-more-exception-handling.diff, DERBY-733.diff, Insert.java, Select.java
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

[jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]

Knut Anders Hatlen updated DERBY-733:
-------------------------------------

    Attachment: DERBY-733-more-exception-handling.diff

Attached patch (DERBY-733-more-exception-handling.diff) that addresses
Mike's concerns for exception handling. If something goes wrong when
locking, Derby will now fall back to the old behaviour.

Derbyall ran without failures.

% svn stat -q
M      java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733-more-exception-handling.diff, DERBY-733.diff, Insert.java, Select.java
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Suresh Thalamati <su...@gmail.com> writes:

> Hi Knut,
>
> I am really puzzled with such a large differene in response
> times between the jdk142 and jdk15 jvms. Did u find out what has
> changed in jdk15 that is making the response time so bad ?

I was told that there wasn't any changes in thread scheduling between
1.4.2. and 1.5, so I guess it's just that monitors/synchronization
have been optimized with more focus on throughput than on fairness. I
think synchronization has become cheaper in 1.5, at least Derby runs a
lot faster on 1.5 than on 1.4.2 in a multi-client environment when it
is CPU bound. But there's always a trade-off...

-- 
Knut Anders

Re: [jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Suresh Thalamati <su...@gmail.com>.

Hi Knut,

I am really puzzled with such a large differene in response
times between the jdk142 and jdk15 jvms. Did u find out what has 
changed in jdk15 that is making the response time so bad ?

 >      Sun JVM 1.4.2: 78.3 tps /    727 ms max resp
 >      Sun JVM 1.5.0: 78.2 tps / 609483 ms max resp


Thanks
suresht

Knut Anders Hatlen (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]
> 
> Knut Anders Hatlen updated DERBY-733:
> -------------------------------------
> 
>     Attachment: Insert.java
>                 Select.java
> 
> I have attached a test case that makes it possible to reproduce the
> reported issue. There are two files:
> 
> 
>   Before revision 356884 was committed:
> 

>      IBM JVM 1.4.2: 78.2 tps /   1079 ms max resp
>    Blackdown 1.4.2: 78.9 tps /    717 ms max resp

>      IBM JVM 1.5.0: Went into infinite loop because of I/O error
> 
>   After revision 356884 was committed:
> 
>      Sun JVM 1.4.2: 78.7 tps /  690 ms max resp
>      IBM JVM 1.4.2: 78.1 tps / 1080 ms max resp
>    Blackdown 1.4.2: 79.3 tps /  656 ms max resp
>      Sun JVM 1.5.0: 79.1 tps /  682 ms max resp
>      IBM JVM 1.5.0: Went into infinite loop because of I/O error
> 
> 

>

[jira] Updated: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-733?page=all ]

Knut Anders Hatlen updated DERBY-733:
-------------------------------------

    Attachment: Insert.java
                Select.java

I have attached a test case that makes it possible to reproduce the
reported issue. There are two files:

* Insert.java:

   Creates a database "mydb" in the current directory and fills it
   with about five gigabytes of data. (If you have more than 1 GB of
   RAM, you might consider increasing the database size since some
   operating systems use all available main memory for file system
   caching, and then you won't be testing disk accesses.) The database
   consists of one table with two columns:

     (id int primary key, text char(100))

   The command "java Insert" will start the program.

* Select.java:

   Start the program with "java Select <clients> <seconds>", where
   <clients> is the number of clients you want to test and <seconds>
   is the number of seconds the test will run. The program will start
   as many clients as requested, and each client will repeatedly
   select a random row from the table generated by the Insert
   program. Every ten seconds the progress of each client is printed
   (number of transactions in the last ten seconds). When the test has
   finished, it prints the throughput and the avg/min/max response
   time.

I have run this test (40 clients/10 minutes) on Linux 2.6.14 with the
following results (this is NOT a benchmark):

  Before revision 356884 was committed:

     Sun JVM 1.4.2: 78.3 tps /    727 ms max resp
     IBM JVM 1.4.2: 78.2 tps /   1079 ms max resp
   Blackdown 1.4.2: 78.9 tps /    717 ms max resp
     Sun JVM 1.5.0: 78.2 tps / 609483 ms max resp
     IBM JVM 1.5.0: Went into infinite loop because of I/O error

  After revision 356884 was committed:

     Sun JVM 1.4.2: 78.7 tps /  690 ms max resp
     IBM JVM 1.4.2: 78.1 tps / 1080 ms max resp
   Blackdown 1.4.2: 79.3 tps /  656 ms max resp
     Sun JVM 1.5.0: 79.1 tps /  682 ms max resp
     IBM JVM 1.5.0: Went into infinite loop because of I/O error

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.1.2.1, 10.2.0.0, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733.diff, Insert.java, Select.java
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Daniel John Debrunner (JIRA)" <de...@db.apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360447 ] 

Daniel John Debrunner commented on DERBY-733:
---------------------------------------------

I wonder if we can use the existing lock manager, which provides predictable queueing behaviour on granting locks,
rather than add a new mechansm.

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733.diff
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

It would be interesting to know the actual performance difference 
between using the java lock call and a lock manager latch.  Using the
latch would mean that we would get the same performance benefit on
jdk1.4 as well.

As I said, it would nice if you submitted your worst case test -- even
if it is not appropriate as a nightly unit test.  That way derby users
could run it against a variety of hardware, OS's and JVM's.  Something
simple to show the problem you are fixing like a single big table, many
threads accessing more pages than fit in cache.

I also
have wondered at what cost to btree performance is the current use
of lock manager based latches vs. something closer than a test/set
of bit on page.  We do get a benefit in that latch/lock deadlocks are
reported correctly by the derby lock manager, which is not possible
once you introduce a different locking mechanism.

Do note that I believe in a different thread it was argued that adding
this new lock was ok as the performance did not matter that much 
compared to the I/O cost of the read.

Øystein Grøvlen (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360454 ] 
> 
> Øystein Grøvlen commented on DERBY-733:
> ---------------------------------------
> 
> It is possible to use the lock manager for a lock that is local to the object?  I do not think it is a good idea to enter the lock into the general lock pool.  That seems like quite a lot of unnecessary overhead.  
> 
> I also think that in the long run we should consider replacing the current latches with the built-in locks provided by Java.
> 
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Improvement
>>  Components: Performance, Store
>>    Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>> Attachments: DERBY-733.diff
>>
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

[jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Øystein Grøvlen (JIRA)" <de...@db.apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12360454 ] 

Øystein Grøvlen commented on DERBY-733:
---------------------------------------

It is possible to use the lock manager for a lock that is local to the object?  I do not think it is a good idea to enter the lock into the general lock pool.  That seems like quite a lot of unnecessary overhead.  

I also think that in the long run we should consider replacing the current latches with the built-in locks provided by Java.


> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Improvement
>   Components: Performance, Store
>     Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>  Attachments: DERBY-733.diff
>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Mike Matrigali <mi...@sbcglobal.net> writes:

> It should be possible to build a 1.5 specific solution, though Bryan has
> not succeeded yet in getting the build side of the solution to work.  Of
> course such a solution will only make 1.5 and future versions go faster.
> In general I think it is better to use the native java support if
> possible, but of course that is up to you.  It may also be the case
> that
> your added class is less work than reorganizing the code to provide a
> pre-1.4 version and post-1.5 version.

I think it is less work adding a lock class than reorganizing the
code. Since I'm going to use the same method names and signatures as
the native java class, we can just modify the variable declaration and
remove the class as soon as we abandon support for 1.4. I'll give this
approach a try first, and if it doesn't work out well, I can look at a
1.5 specific solution.

> I know someone had mentioned this issue before and that they had a queue
> of open I/O working (rather than a queue of waiters on the single
> I/O), did I miss you posting a patch for that?

Øystein posted a patch which allowed more than one thread to read from
the same file at the same time. His patch was posted merely as an
example, and it would probably run out of file descriptors if there
were many tables in the database. I have tested his patch, but the
variance in the response times was still way too high. The primary
objective with his patch was to increase the throughput on disk-bound
databases, and at this point it was in fact quite successful.

-- 
Knut Anders

Re: [jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by Mike Matrigali <mi...@sbcglobal.net>.

It should be possible to build a 1.5 specific solution, though Bryan has
not succeeded yet in getting the build side of the solution to work.  Of
course such a solution will only make 1.5 and future versions go faster.
In general I think it is better to use the native java support if 
possible, but of course that is up to you.  It may also be the case that
your added class is less work than reorganizing the code to provide a
pre-1.4 version and post-1.5 version.

I know someone had mentioned this issue before and that they had a queue
of open I/O working (rather than a queue of waiters on the single I/O), 
did I miss you posting a patch for that?

Knut Anders Hatlen (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12358925 ] 
> 
> Knut Anders Hatlen commented on DERBY-733:
> ------------------------------------------
> 
> I have tried to put calls to the lock() and unlock() methods in the
> Java 1.5 ReentrantLock class around the synchronized block. With this
> change, the difference between the maximum response time and the
> average response time is decreased to an acceptable level.
> 
> Since Derby can't rely on features from Java 1.5, we have to implement
> our own lock class which works on Java 1.3 and 1.4. This should be
> relatively simple. I will try to implement a class which can be used
> as a drop-in replacement for java.util.concurrent.locks.ReentrantLock
> and test it to be sure that it doesn't pose any significant
> overhead. I think the overhead posed by maintaining a queue of waiters
> will be small compared to the time it takes to read a page from the
> disk.
> 
> 
>>Starvation in RAFContainer.readPage()
>>-------------------------------------
>>
>>         Key: DERBY-733
>>         URL: http://issues.apache.org/jira/browse/DERBY-733
>>     Project: Derby
>>        Type: Bug
>>  Components: Performance, Store
>>    Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>> Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
> 
> 
>>When Derby is completely disk bound, threads might be starved in
>>RAFContainer.readPage(). This is a real problem when multiple clients
>>are repeatedly accessing one or a small number of large tables. In
>>cases like this, I have observed very high maximum response times
>>(several minutes in the worst cases) on simple transactions. The
>>average response time is not affected by this.
>>The starvation is caused by a synchronized block in
>>RAFContainer.readPage():
>>  synchronized (this) {
>>      fileData.seek(pageOffset);
>>      fileData.readFully(pageData, 0, pageSize);
>>  }
>>If many threads want to read pages from the same file, there will be a
>>long queue of threads waiting for this monitor. Since the Java
>>specification does not guarantee that threads waiting for monitors are
>>treated fairly, some threads might have to wait for a long time before
>>they get the monitor. (Usually, a couple of threads get full throughput
>>while the others have to wait.)
> 
>

[jira] Commented: (DERBY-733) Starvation in RAFContainer.readPage()

Posted by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-733?page=comments#action_12358925 ] 

Knut Anders Hatlen commented on DERBY-733:
------------------------------------------

I have tried to put calls to the lock() and unlock() methods in the
Java 1.5 ReentrantLock class around the synchronized block. With this
change, the difference between the maximum response time and the
average response time is decreased to an acceptable level.

Since Derby can't rely on features from Java 1.5, we have to implement
our own lock class which works on Java 1.3 and 1.4. This should be
relatively simple. I will try to implement a class which can be used
as a drop-in replacement for java.util.concurrent.locks.ReentrantLock
and test it to be sure that it doesn't pose any significant
overhead. I think the overhead posed by maintaining a queue of waiters
will be small compared to the time it takes to read a page from the
disk.

> Starvation in RAFContainer.readPage()
> -------------------------------------
>
>          Key: DERBY-733
>          URL: http://issues.apache.org/jira/browse/DERBY-733
>      Project: Derby
>         Type: Bug
>   Components: Performance, Store
>     Versions: 10.2.0.0, 10.1.2.1, 10.1.3.0, 10.1.2.2
>  Environment: Solaris x86 and Linux with Sun JVM 1.5.0. Derby embedded and client/server.
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen

>
> When Derby is completely disk bound, threads might be starved in
> RAFContainer.readPage(). This is a real problem when multiple clients
> are repeatedly accessing one or a small number of large tables. In
> cases like this, I have observed very high maximum response times
> (several minutes in the worst cases) on simple transactions. The
> average response time is not affected by this.
> The starvation is caused by a synchronized block in
> RAFContainer.readPage():
>   synchronized (this) {
>       fileData.seek(pageOffset);
>       fileData.readFully(pageData, 0, pageSize);
>   }
> If many threads want to read pages from the same file, there will be a
> long queue of threads waiting for this monitor. Since the Java
> specification does not guarantee that threads waiting for monitors are
> treated fairly, some threads might have to wait for a long time before
> they get the monitor. (Usually, a couple of threads get full throughput
> while the others have to wait.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira