You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Raymond Raymond <ra...@hotmail.com> on 2005/10/27 03:41:28 UTC

Some idea about automatic checkpointing issue

Oystein wrote:
>
>I would like to suggest the following:
>         - 1. The user may be able to configure a certain recovery time
>           that Derby should try to satisfy.  (An appropriate default
>           must be determined).
>         - 2. During initilization of Derby, we run some measurement that
>           determines the performance of the system and maps the
>           recovery time into some X megabytes of log.)
>         - 3. A checkpoint is made by default every X megabytes of log.
>         - 4. One tries to dynamically adjust the write rate of the
>           checkpoint so that the writing takes an entire checkpoint
>           interval.  (E.g., write Y pages, then pause for some time).
>         - 5. If data reads or a log writes (if log in default location)
>           start to have long response times, one can increase the
>           checkpoint interval.  The user should be able to turn this
>           feature off in case longer recovery times are no acceptable.
>
>Hope this rambling has some value,
>
>--
>Řystein
>
Thanks for Oystein's comment. I agree with your comment
and I have any other thought about it. In order to be easier
to explain,I added the sequence number to your comment.

In step 3 and 4 I have another idea. Generally, we do checkpointing
from the earliest useful log record which is determined by the
repPoint and the undoLWM, whichever is earlier, to the current
log instant (redoLWM) and then update the derby control file(ref. 
http://db.apache.org/derby/papers/recovery.html). I agree with
you to spread the writes out over the checkpoint interval, but the
trade-off is that we have to do recovery from the penultimate
checkpoint(Am I right here?^_^). If the log is long, it will take us
a long time in recovery. How about we update the derby control
file periodically instead updating the control file when the whole
checkpoint is done? (E.g. write several pages, if we detect that the
system is busy, then we update the derby control file and pause for
some time or we update the control file once every several minutes)
That seems we do a part of checkpoint at a time if the system
become busy. In this way, if the system crushes, the last checkpoint
mark (the log address up to where the last checkpoint did)will be
closer to the tail of the log than if we update the control file when
the whole checkpoint is done. Maybe we can call it Incremental
Checkpointing.

I will keep thinking of this issue to find a good way to do it. Welcome
everyone gives your comment.

Thanks.


Raymond

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


Re: Some idea about automatic checkpointing issue

Posted by Raymond Raymond <ra...@hotmail.com>.
>From: Oystein.Grovlen@Sun.COM (Řystein Grřvlen)
>Reply-To: <de...@db.apache.org>
>To: derby-dev@db.apache.org
>Subject: Re: Some idea about automatic checkpointing issue
>Date: Mon, 31 Oct 2005 15:36:39 +0100
>
> >>>>> "RR" == Raymond Raymond <ra...@hotmail.com> writes:
>
>     RR> Oystein wrote:
>     >>
>     >> I would like to suggest the following:
>     >> - 1. The user may be able to configure a certain recovery time
>     >> that Derby should try to satisfy.  (An appropriate default
>     >> must be determined).
>     >> - 2. During initilization of Derby, we run some measurement that
>     >> determines the performance of the system and maps the
>     >> recovery time into some X megabytes of log.)
>     >> - 3. A checkpoint is made by default every X megabytes of log.
>     >> - 4. One tries to dynamically adjust the write rate of the
>     >> checkpoint so that the writing takes an entire checkpoint
>     >> interval.  (E.g., write Y pages, then pause for some time).
>     >> - 5. If data reads or a log writes (if log in default location)
>     >> start to have long response times, one can increase the
>     >> checkpoint interval.  The user should be able to turn this
>     >> feature off in case longer recovery times are no acceptable.
>     >>
>     >> Hope this rambling has some value,
>     >>
>     >> --
>     >> Řystein
>     >>
>     RR> Thanks for Oystein's comment. I agree with your comment
>     RR> and I have any other thought about it. In order to be easier
>     RR> to explain,I added the sequence number to your comment.
>
>     RR> In step 3 and 4 I have another idea. Generally, we do 
>checkpointing
>     RR> from the earliest useful log record which is determined by the
>     RR> repPoint and the undoLWM, whichever is earlier, to the current
>     RR> log   instant   (redoLWM)   and   then  update   the   derby   
>control
>     RR> file(ref.  http://db.apache.org/derby/papers/recovery.html).
>
>I am not sure I understand what you mean by "do checkpointing".  Are
>you talking about writing the checkpoint log record to the log?
>
>     RR> I  agree with
>     RR> you to spread the writes out over the checkpoint interval, but the
>     RR> trade-off is that we have to do recovery from the penultimate
>     RR> checkpoint(Am I right here?^_^). If the log is long, it will take 
>us
>     RR> a long time in recovery.
>
>>From the perspective of recovery, it will still be the checkpoint
>reflected in the log control file.  It is true that a new checkpoint
>had probably been started when the crash occurred, but that may happen
>today also.  It is less likely, but the principles are the same.
>
>I agree that it will be more log to redo during recovery.  The
>advantage with my proposal is that the recovery time will be more
>deterministic since it will less dependent on how long time it takes
>to clean the page cache.  The average log size for recovery will
>always be 1.5 checkpoint interval with my proposal.  The maximum log
>size will be 2 checkpoint interval, and this is also true for the
>current solution.  If the goal is to guarantee a maximum recovery
>time, I think my proposal is better.  It is no point in reducing
>performance in order to be able to do recovery in 30 seconds, if the
>user is willing to accept recovery times of 2 minutes.
>
>     RR> How about we update the derby control
>     RR> file periodically instead updating the control file when the whole
>     RR> checkpoint is done? (E.g. write several pages, if we detect that 
>the
>     RR> system is busy, then we update the derby control file and pause 
>for
>     RR> some time or we update the control file once every several
>     RR> minutes)
>
>I guess that is possible, but in that case, you will need to have some
>way of determining the redoLWM for the checkpoint.  It will no longer
>be the current log instant when the checkpoint starts.  I guess you
>can do this by either scanning the entire page cache or by keeping the
>pages sorted by age.
>
>     RR> That seems we do a part of checkpoint at a time if the system
>     RR> become busy. In this way, if the system crushes, the last 
>checkpoint
>     RR> mark (the log address up to where the last checkpoint did)will be
>     RR> closer to the tail of the log than if we update the control file 
>when
>     RR> the whole checkpoint is done. Maybe we can call it Incremental
>     RR> Checkpointing.
>
>Unless each checkpoint cleans the entire cache, the redoLWM may be
>much older than the the last checkpoint mark.  Hence, updating the
>control file more often, does not reduce recovery times by itself.
>However, making sure that the oldest dirty pages are written a
>checkpoint, should advance redoLWM and reduce recovery times.
>
>--
>Řystein
>

Last time when I discussed the automatic checkpointing issue with you and 
Mike,
I suggested that maybe we can establish a dirty page list in wich dirty 
pages are
sorted in ascending order of the time when they were firt updated. I don't 
mean
we need to copy the whole page to the list just some identification of the 
page that
can make us find the corresponding page later.When pages are fist updated, 
it will
be linked to the list and when it is flushed out to disk, it will be 
released from the
link. In this way, the oldest dirty pages will be in the head  and the 
lastest will be
in the tail. When we do checkpointing, we scan form the head to the end of 
the
list. I think that will guarantee the oldest dirty pages are written in a 
checkpoint.


Thanks.


Raymond

_________________________________________________________________
Take advantage of powerful junk e-mail filters built on patented MicrosoftŽ 
SmartScreen Technology. 
http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines 
  Start enjoying all the benefits of MSNŽ Premium right now and get the 
first two months FREE*.


Re: Some idea about automatic checkpointing issue

Posted by Øystein Grøvlen <Oy...@Sun.COM>.
>>>>> "RR" == Raymond Raymond <ra...@hotmail.com> writes:

    RR> Oystein wrote:
    >> 
    >> I would like to suggest the following:
    >> - 1. The user may be able to configure a certain recovery time
    >> that Derby should try to satisfy.  (An appropriate default
    >> must be determined).
    >> - 2. During initilization of Derby, we run some measurement that
    >> determines the performance of the system and maps the
    >> recovery time into some X megabytes of log.)
    >> - 3. A checkpoint is made by default every X megabytes of log.
    >> - 4. One tries to dynamically adjust the write rate of the
    >> checkpoint so that the writing takes an entire checkpoint
    >> interval.  (E.g., write Y pages, then pause for some time).
    >> - 5. If data reads or a log writes (if log in default location)
    >> start to have long response times, one can increase the
    >> checkpoint interval.  The user should be able to turn this
    >> feature off in case longer recovery times are no acceptable.
    >> 
    >> Hope this rambling has some value,
    >> 
    >> --
    >> Øystein
    >> 
    RR> Thanks for Oystein's comment. I agree with your comment
    RR> and I have any other thought about it. In order to be easier
    RR> to explain,I added the sequence number to your comment.

    RR> In step 3 and 4 I have another idea. Generally, we do checkpointing
    RR> from the earliest useful log record which is determined by the
    RR> repPoint and the undoLWM, whichever is earlier, to the current
    RR> log   instant   (redoLWM)   and   then  update   the   derby   control
    RR> file(ref.  http://db.apache.org/derby/papers/recovery.html).  

I am not sure I understand what you mean by "do checkpointing".  Are
you talking about writing the checkpoint log record to the log?

    RR> I  agree with
    RR> you to spread the writes out over the checkpoint interval, but the
    RR> trade-off is that we have to do recovery from the penultimate
    RR> checkpoint(Am I right here?^_^). If the log is long, it will take us
    RR> a long time in recovery. 

>From the perspective of recovery, it will still be the checkpoint
reflected in the log control file.  It is true that a new checkpoint
had probably been started when the crash occurred, but that may happen
today also.  It is less likely, but the principles are the same.

I agree that it will be more log to redo during recovery.  The
advantage with my proposal is that the recovery time will be more
deterministic since it will less dependent on how long time it takes
to clean the page cache.  The average log size for recovery will
always be 1.5 checkpoint interval with my proposal.  The maximum log
size will be 2 checkpoint interval, and this is also true for the
current solution.  If the goal is to guarantee a maximum recovery
time, I think my proposal is better.  It is no point in reducing
performance in order to be able to do recovery in 30 seconds, if the
user is willing to accept recovery times of 2 minutes. 

    RR> How about we update the derby control
    RR> file periodically instead updating the control file when the whole
    RR> checkpoint is done? (E.g. write several pages, if we detect that the
    RR> system is busy, then we update the derby control file and pause for
    RR> some time or we update the control file once every several
    RR> minutes)

I guess that is possible, but in that case, you will need to have some
way of determining the redoLWM for the checkpoint.  It will no longer
be the current log instant when the checkpoint starts.  I guess you
can do this by either scanning the entire page cache or by keeping the
pages sorted by age.

    RR> That seems we do a part of checkpoint at a time if the system
    RR> become busy. In this way, if the system crushes, the last checkpoint
    RR> mark (the log address up to where the last checkpoint did)will be
    RR> closer to the tail of the log than if we update the control file when
    RR> the whole checkpoint is done. Maybe we can call it Incremental
    RR> Checkpointing.

Unless each checkpoint cleans the entire cache, the redoLWM may be
much older than the the last checkpoint mark.  Hence, updating the
control file more often, does not reduce recovery times by itself.
However, making sure that the oldest dirty pages are written a
checkpoint, should advance redoLWM and reduce recovery times.
 
-- 
Øystein