You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Bergquist, Brett" <BB...@canoga.com> on 2016/04/20 03:30:48 UTC

Is there any debug logging available when a database is being recovered

Another issue with about 1100 log files needing to be processed after a restart of the database network server.  What I would like to know is if there is any logging that can tell when pass the database recovery is on, what percentage is done, etc.  Some feedback that can be used to determine progress.

Thanks.

Brett

Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300

This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).

Re: Is there any debug logging available when a database is being recovered

Posted by mike matrigali <mi...@gmail.com>.
I dont have much help to offer, only some history.  I do not think there was ever any performance work done
in this area, so hopefully there is some low hanging fruit.  I have a fuzzy memory that restart recover undo was
really inefficient as there is not a good way to walk backward in the log per transaction.

First would be good to understand performance per phase.  redo, vs undo.  i would expect redo to run pretty fast
but everything is very i/o bound if there is a lot of log and lot of database pages being looked at.  The whole
thing is single threaded and probably does not use very much memory.  so a lot of your machine resources
are not being used.  making it parallel is likely hard as there are likely assumptions on it being single threaded
and thus it does not need locks/latches for concurrency (or at least it can set flags saying it is ok with double latching).

to understand the size of the recovery you were looking at, what was the total size of seg0 of your db and what was
the log directory size.   And for log size do you know if you keep more than is required for restart recovery.  It would
be if you have the setting that keeps all old logs so that you can recover from a backup at specific time and then apply
current set of logs.

Also if you get set up to run experiments an easy test would be to run test first with I/O configured as we need for
correctness - ie with system sync write enabled.  And then run it with system write sync disabled.  If this is a big
difference you know the I/O part of problem is sync write vs some other issue.  We have optimized doing as few sync
writes as necessary in "normal" path - but maybe there are recovery paths that do not.

Derby was originally designed to fit very small dbs with very low xact load.  so there are very likely some assumptions
that don't fit your problem set.

On 4/25/2016 6:41 AM, Bergquist, Brett wrote:
> Thanks for the pointes Kristian.   I definitely will be in this code shortly.
>
> The system appeared to not be CPU bound but rather I/O bound.   I need to get a test setup into a condition like at the customer first and then start doing some debugging and profiling to see where things are.
>
> -----Original Message-----
> From: Kristian Waagan [mailto:kristwaa.apache@gmail.com]
> Sent: Saturday, April 23, 2016 4:22 PM
> To: derby-dev@db.apache.org
> Subject: Re: Is there any debug logging available when a database is being recovered
>
> Den 20.04.2016 16.46, skrev Bergquist, Brett:
>> It took longer (and never finished) to process the recovery logs file that it did to restore the database from the previous night from backup (that took 6 hours to "untar")   At about 14 hours, the recovery log files were not finished processing so we had to give up and restore from backup.   That seems quite an inordinate amount of time to process the recovery log files.    Note that this was not an underpowered machine either.   It was an Oracle M5000 with 32 processors, 128G of memory, and Fibre Channel attached storage.
>>
>> I would like to start looking at this, so if anyone has any ideas where to begin, shout them out.
> Hi Brett,
>
> >From what I can see, there is some logging available in debug builds using the LogTrace-flag. I have no recollection of using it, so I cannot say how useful the information you get from enabling it is.
>
> To try it, get the debug build and specify "derby.debug.true=LogTrace".
> Based on a quick scan of the code, without verifying anything, it looks like you should see when Derby switches to the next log file. There is also a log entry for the end of the redo phase.
>
> Obtaining the stack trace at various point might also give some clues if you're willing to dive into the code :)
>
> I expect this part of the code is singlethreaded, so I hope the machine can give you decent performance using a single logical CPU. Do you suspect CPU or IO to be a bottleneck?
>
>
> Regards,
> --
> Kristian
>
>> -----Original Message-----
>> From: Bryan Pendleton [mailto:bpendleton.derby@gmail.com]
>> Sent: Wednesday, April 20, 2016 5:53 AM
>> To: derby-dev@db.apache.org
>> Subject: Re: Is there any debug logging available when a database is
>> being recovered
>>
>>> So what would be useful would be something like:
>>>
>>> Performing database recovery
>>> Starting analysis pass
>>> 215 transactions detected to be processed Starting redo pass ....
>>> anything that could give some feedback Starting undo pass
>>> ....anything that could give some feedback
>>>
>> I'm afraid all I can add is: yes, I completely agree, that would be extremely useful.
>>
>> bryan
>>
>>
>>
>> Canoga Perkins
>> 20600 Prairie Street
>> Chatsworth, CA 91311
>> (818) 718-6300
>>
>> This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).
>>
>
> Canoga Perkins
> 20600 Prairie Street
> Chatsworth, CA 91311
> (818) 718-6300
>
> This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).
>


-- 
email:    Mike Matrigali - mikemapp1@gmail.com
linkedin: https://www.linkedin.com/in/MikeMatrigali


RE: Is there any debug logging available when a database is being recovered

Posted by "Bergquist, Brett" <BB...@canoga.com>.
Thanks for the pointes Kristian.   I definitely will be in this code shortly.

The system appeared to not be CPU bound but rather I/O bound.   I need to get a test setup into a condition like at the customer first and then start doing some debugging and profiling to see where things are.

-----Original Message-----
From: Kristian Waagan [mailto:kristwaa.apache@gmail.com]
Sent: Saturday, April 23, 2016 4:22 PM
To: derby-dev@db.apache.org
Subject: Re: Is there any debug logging available when a database is being recovered

Den 20.04.2016 16.46, skrev Bergquist, Brett:
> It took longer (and never finished) to process the recovery logs file that it did to restore the database from the previous night from backup (that took 6 hours to "untar")   At about 14 hours, the recovery log files were not finished processing so we had to give up and restore from backup.   That seems quite an inordinate amount of time to process the recovery log files.    Note that this was not an underpowered machine either.   It was an Oracle M5000 with 32 processors, 128G of memory, and Fibre Channel attached storage.
>
> I would like to start looking at this, so if anyone has any ideas where to begin, shout them out.

Hi Brett,

>From what I can see, there is some logging available in debug builds using the LogTrace-flag. I have no recollection of using it, so I cannot say how useful the information you get from enabling it is.

To try it, get the debug build and specify "derby.debug.true=LogTrace".
Based on a quick scan of the code, without verifying anything, it looks like you should see when Derby switches to the next log file. There is also a log entry for the end of the redo phase.

Obtaining the stack trace at various point might also give some clues if you're willing to dive into the code :)

I expect this part of the code is singlethreaded, so I hope the machine can give you decent performance using a single logical CPU. Do you suspect CPU or IO to be a bottleneck?


Regards,
--
Kristian

>
> -----Original Message-----
> From: Bryan Pendleton [mailto:bpendleton.derby@gmail.com]
> Sent: Wednesday, April 20, 2016 5:53 AM
> To: derby-dev@db.apache.org
> Subject: Re: Is there any debug logging available when a database is
> being recovered
>
>> So what would be useful would be something like:
>>
>> Performing database recovery
>> Starting analysis pass
>> 215 transactions detected to be processed Starting redo pass ....
>> anything that could give some feedback Starting undo pass
>> ....anything that could give some feedback
>>
>
> I'm afraid all I can add is: yes, I completely agree, that would be extremely useful.
>
> bryan
>
>
>
> Canoga Perkins
> 20600 Prairie Street
> Chatsworth, CA 91311
> (818) 718-6300
>
> This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).
>


Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300

This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).

Re: Is there any debug logging available when a database is being recovered

Posted by Kristian Waagan <kr...@gmail.com>.
Den 20.04.2016 16.46, skrev Bergquist, Brett:
> It took longer (and never finished) to process the recovery logs file that it did to restore the database from the previous night from backup (that took 6 hours to "untar")   At about 14 hours, the recovery log files were not finished processing so we had to give up and restore from backup.   That seems quite an inordinate amount of time to process the recovery log files.    Note that this was not an underpowered machine either.   It was an Oracle M5000 with 32 processors, 128G of memory, and Fibre Channel attached storage.
> 
> I would like to start looking at this, so if anyone has any ideas where to begin, shout them out.

Hi Brett,

>From what I can see, there is some logging available in debug builds
using the LogTrace-flag. I have no recollection of using it, so I cannot
say how useful the information you get from enabling it is.

To try it, get the debug build and specify "derby.debug.true=LogTrace".
Based on a quick scan of the code, without verifying anything, it looks
like you should see when Derby switches to the next log file. There is
also a log entry for the end of the redo phase.

Obtaining the stack trace at various point might also give some clues if
you're willing to dive into the code :)

I expect this part of the code is singlethreaded, so I hope the machine
can give you decent performance using a single logical CPU. Do you
suspect CPU or IO to be a bottleneck?


Regards,
-- 
Kristian

> 
> -----Original Message-----
> From: Bryan Pendleton [mailto:bpendleton.derby@gmail.com]
> Sent: Wednesday, April 20, 2016 5:53 AM
> To: derby-dev@db.apache.org
> Subject: Re: Is there any debug logging available when a database is being recovered
> 
>> So what would be useful would be something like:
>>
>> Performing database recovery
>> Starting analysis pass
>> 215 transactions detected to be processed Starting redo pass ....
>> anything that could give some feedback Starting undo pass ....anything
>> that could give some feedback
>>
> 
> I'm afraid all I can add is: yes, I completely agree, that would be extremely useful.
> 
> bryan
> 
> 
> 
> Canoga Perkins
> 20600 Prairie Street
> Chatsworth, CA 91311
> (818) 718-6300
> 
> This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).
> 


RE: Is there any debug logging available when a database is being recovered

Posted by "Bergquist, Brett" <BB...@canoga.com>.
It took longer (and never finished) to process the recovery logs file that it did to restore the database from the previous night from backup (that took 6 hours to "untar")   At about 14 hours, the recovery log files were not finished processing so we had to give up and restore from backup.   That seems quite an inordinate amount of time to process the recovery log files.    Note that this was not an underpowered machine either.   It was an Oracle M5000 with 32 processors, 128G of memory, and Fibre Channel attached storage.

I would like to start looking at this, so if anyone has any ideas where to begin, shout them out.

-----Original Message-----
From: Bryan Pendleton [mailto:bpendleton.derby@gmail.com]
Sent: Wednesday, April 20, 2016 5:53 AM
To: derby-dev@db.apache.org
Subject: Re: Is there any debug logging available when a database is being recovered

> So what would be useful would be something like:
>
> Performing database recovery
> Starting analysis pass
> 215 transactions detected to be processed Starting redo pass ....
> anything that could give some feedback Starting undo pass ....anything
> that could give some feedback
>

I'm afraid all I can add is: yes, I completely agree, that would be extremely useful.

bryan



Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300

This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).

Re: Is there any debug logging available when a database is being recovered

Posted by Bryan Pendleton <bp...@gmail.com>.
> So what would be useful would be something like:
>
> Performing database recovery
> Starting analysis pass
> 215 transactions detected to be processed
> Starting redo pass
> …. anything that could give some feedback
> Starting undo pass
> ….anything that could give some feedback
>

I'm afraid all I can add is: yes, I completely agree, that
would be extremely useful.

bryan



Re: Is there any debug logging available when a database is being recovered

Posted by "Bergquist, Brett" <BB...@canoga.com>.
We are doing a “truss -p <pid> 2>&1 | grep open” but what I am seeing right now is that one complete pass was made through the logs and I thought that it was about done.  Now it is going into a pattern where it processes something like

        logxxxx359.dat
        logxxxx358.dat
        …
        logxxxx242.dat

It is updating some “seg0” files (either re-doing or undoing) and then sometimes creates “logxxxx360.dat” (ie. the next number) and then starts the process over again.   Been doing recovery now for over 5 hours.  Ouch indeed.

Restoring from a backup is about a 5 hour process as well as the database is about 550G.   Just un-taring the backup file takes a substantial amount of time.

The problem occurred when an external system all of a sudden issued 128 concurrent requests to our server where each request removes 10’s of thousands of records in a transaction.  These all caused a gridlock on locking, lock timeouts, the cleanup messages being output to derby-?.log (rolling log support), etc.   The system administrator forcefully shutdown Derby and restarted.

I know the deletion of 10’s of thousands of records in a transaction is bad design.  It did not start out at this scale (a few hundred records to start) but grew into this problem.   The item being deleted represents a piece of network equipment and we can’t have a partial piece of equipment laying around in the database.   Shortly this will be re-designed to mark as deleted and then perform the deletion in the background at little chunks at a time, but of course the issue arose before the solution is complete.

So what would be useful would be something like:

Performing database recovery
Starting analysis pass
215 transactions detected to be processed
Starting redo pass
…. anything that could give some feedback
Starting undo pass
….anything that could give some feedback





> On Apr 20, 2016, at 12:10 AM, Bryan Pendleton <bp...@gmail.com> wrote:
>
>> Another issue with about 1100 log files needing to be
>> processed after a restart of the database network server.
>
> Ouch. :(
>
>> is any logging that can tell when pass the database recovery is on
>
> Perhaps use an operating system level monitoring tool (Process Monitor
> on Windows, strace on Linux, etc.) to see if you can watch the store
> opening and accessing each log file.
>
> I'm supposing that the recovery processing essentially reads those
> logs sequentially, so if you watch it for a while you can see how
> long it takes to finish one log and move on to the next?
>
> bryan
>


Canoga Perkins
20600 Prairie Street
Chatsworth, CA 91311
(818) 718-6300

This e-mail and any attached document(s) is confidential and is intended only for the review of the party to whom it is addressed. If you have received this transmission in error, please notify the sender immediately and discard the original message and any attachment(s).

Re: Is there any debug logging available when a database is being recovered

Posted by Bryan Pendleton <bp...@gmail.com>.
 > Another issue with about 1100 log files needing to be
 > processed after a restart of the database network server.

Ouch. :(

 >  is any logging that can tell when pass the database recovery is on

Perhaps use an operating system level monitoring tool (Process Monitor
on Windows, strace on Linux, etc.) to see if you can watch the store
opening and accessing each log file.

I'm supposing that the recovery processing essentially reads those
logs sequentially, so if you watch it for a while you can see how
long it takes to finish one log and move on to the next?

bryan