You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Bergquist, Brett" <BB...@canoga.com> on 2012/09/10 20:07:29 UTC

Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Because the online backup was taking a long time and effecting performance, and the customer's system was using the ZFS file system on Solaris.

I wrote a utility that does the following:


1.       Freezes the database

2.       Invokes a system command to perform a ZFS snapshot

3.       Unfreezes the database

4.       Creates a backup of the ZFS snapshot using 'tar' and 'compress'

5.       Removes the ZFS snapshot

The ZFS snapshot takes about 1 or 2 seconds so the time between step 1 and step 3 is a couple of seconds.    The utility has checks to make sure that if step 1 succeeds that it will do a step 3.   The basic logic looks like:

   private void run(String[] args) {
        parseArguments(args);
        loadDbDriver();
        final Connection conn = openDatabaseConnection();

        int res = 0;
        try {
            Thread shudownHook = new Thread() {
                @Override
                public void run() {
                    attemptToUnfreezeDatabase(conn);
                }
            };
            Runtime.getRuntime().addShutdownHook(shudownHook);
            freezeDatabase(conn);
            try {
                res = executeCopyCommand();
            } finally {
                unfreezeDatabase(conn);
                Runtime.getRuntime().removeShutdownHook(shudownHook);
            }
        } finally {
            closeDatabaseConnection(conn);
        }

        System.exit(res);
    }

So it registers a shutdown hook and also performs the system level command to perform the ZFS snapshot in a try/finally block, doing both to ensure that the unfreeze is done if the freeze was done.    This has been working really well each night for about 2 months but Saturday night something failed.

>From the stack traces of the Derby engine, it appears that something causes the utility to fail after the database was frozen and neither the shutdown hook nor the try/finally unfroze the database.   So after that point, the database was effectively locked up.   The system was still operating and connections were being made trying to access the  database exhausting all of the connections.

So I was thinking that maybe the database engine should have some sort of protection if this were to happen.   Maybe the database engine should automatically unfreeze the database if the connection that freezes the database terminates/closes.   Or maybe a timer to be added to the freeze command to automatically unfreeze the database after the fact.

I am thinking this because I was told on a previous emailing when trying to build this utility totally from a script point of view using IJ to freeze the database, SH to perform the ZFS snapshot and IJ to unfreeze the database that it was not expected that the freeze/unfreeze would be done from separate connections.  I fact I ran into a problem with the utility at that point where the IJ connection to unfreeze could not be created because the database was frozen.

So I guess is there ever a use case that would require a database to be frozen and not unfrozen before the connection is closed/lost?

RE: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Posted by "Bergquist, Brett" <BB...@canoga.com>.

Thanks Dag for taking the time to respond.   See the following which occurred when trying to use FREEZE/UNFREEZE from a script using IJ:

http://markmail.org/search/?q=freeze%20database%20brett#query:freeze%20database%20brett+page:2+mid:e54e5pc43qoymdja+state:results

and also here:

http://markmail.org/search/?q=derby+freeze+backup+brett#query:derby%20freeze%20backup%20brett+page:1+mid:d36xvhbdnuqodqxi+state:results

where Mike Matrigali commented:


Here is the documentation for using freeze/unfreeze to do a backup.  The
expectation is that the freeze and unfreeze comes

from the same connection (or at least that is likely what is tested in derby).

http://db.apache.org/derby/docs/dev/adminguide/cadminhubbkup75469.html



I don't remember but think it might be likely that future connection requests
are stalled while a database is frozen, as part of work

necessary to keep a database in an ok state for a user backup routine to be
called.  Logically you just need to stop writing transactions

but the implementation may just of have stall all connections.



Obviously this does not work well for the 3 separate script steps you describe,
but would be good to know if your use case works for

the intended use of freeze/unfreeze.  And even if derby only supports same
connection freeze

and unfreeze, we should understand what is expected if the connection executing
the freeze fails or exits before doing unfreeze.

So from this, I still think it might be better to automatically unfreeze the database if a connection is lost.  In the production environment it is quite possible that between the freeze and copy that all of the remaining connections available become used and blocked waiting for the database to unfreeze and if a failure occurs, there is now no way to unfreeze the database.

I have tried to use freeze/unfreeze from both a script and a utility application now to perform a backup as describe in the documentation, and both ways I have had something fail and being locked out and unable to invoke unfreeze.


From: Dag Wanvik [mailto:dag.wanvik@oracle.com]
Sent: Monday, September 10, 2012 6:31 PM
To: derby-dev@db.apache.org
Subject: Re: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust



On 10.09.2012 20:07, Bergquist, Brett wrote:

I am thinking this because I was told on a previous emailing when trying to build this utility totally from a script point of view using IJ to freeze the database, SH to perform the ZFS snapshot and IJ to unfreeze the database that it was not expected that the freeze/unfreeze would be done from separate connections.  I fact I ran into a problem with the utility at that point where the IJ connection to unfreeze could not be created because the database was frozen.

Hmm. I just tried this, and found I can make another connection even when the database is frozen, although when I try to do an update the operation hangs as expected. Maybe there is a bug the prohibits a new connection in some (hitherto uncharacterized) cases?

Dag




So I guess is there ever a use case that would require a database to be frozen and not unfrozen before the connection is closed/lost?

Re: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Posted by Dag Wanvik <da...@oracle.com>.


On 10.09.2012 20:07, Bergquist, Brett wrote:
>
> I am thinking this because I was told on a previous emailing when 
> trying to build this utility totally from a script point of view using 
> IJ to freeze the database, SH to perform the ZFS snapshot and IJ to 
> unfreeze the database that it was not expected that the 
> freeze/unfreeze would be done from separate connections.  I fact I ran 
> into a problem with the utility at that point where the IJ connection 
> to unfreeze could not be created because the database was frozen.
>

Hmm. I just tried this, and found I can make another connection even 
when the database is frozen, although when I try to do an update the 
operation hangs as expected. Maybe there is a bug the prohibits a new 
connection in some (hitherto uncharacterized) cases?

Dag


> So I guess is there ever a use case that would require a database to 
> be frozen and not unfrozen before the connection is closed/lost?
>

Re: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Posted by Dag Wanvik <da...@oracle.com>.

On 11.09.2012 22:22, Bergquist, Brett wrote:
> I will try to implement this maybe as a param to the system procedure 
> and supply a patch for possible inclusion into derby

Great! Going forward, we have been thinking about extending support for 
management via JMX, unfreezing a frozen might be a candidate use case.

Dag

Re: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Posted by "Bergquist, Brett" <BB...@canoga.com>.

The utility that performs the freeze/zfs snapshot/unfreeze is run at a customer sit via cron and unfortunately the output was overwritten by the next nights run which blocked trying to freeze the database. I verified that by looking at the stack traces of derby using jstack. Other than the database being frozen and all connections in the pool to derby in use, derby engine looked ok.

The reason for the utility failure was lost but neither the try/finally nor the JVM shutdown hook worked to unfreeze the database. Maybe the JVM running the utility crashed but at this point I don't know.

I did extensive testing of the utility with normal possibilities like the zfs snapshot failing, interrupting the utility with SIGINT etc and these were handled and the database unfreeze called.

So at this point I don't know what failed but something did and left the database frozen so an automatic mechanism in the derby engine would be most welcome.

I will try to implement this maybe as a param to the system procedure and supply a patch for possible inclusion into derby

On Sep 11, 2012, at 2:21 PM, "Dag Wanvik" <da...@oracle.com>> wrote:

On 10.09.2012 20:07, Bergquist, Brett wrote:
From the stack traces of the Derby engine, it appears that something causes the utility to fail after the database was frozen and neither the shutdown hook nor the try/finally unfroze the database. So after that point, the database was effectively locked up. The system was still operating and connections were being made trying to access the database exhausting all of the connections.

So if none of your unfreeze attempt worked here, what happened to that process/VM? You say "failed", did it hang, did it complete normally, albeit with no effect, or was it killed off?
Curious, since If we were to implement an automatic unfreeze at connection close, if the method you used didn't work, an automatic unfreeze might fail too (if there is a bug in the code that prevents unfreeze from doing its thing). If it hangs without unfreezing it might be interesting to see the VM state at that point, e.g. via jstack.

So I was thinking that maybe the database engine should have some sort of protection if this were to happen. Maybe the database engine should automatically unfreeze the database if the connection that freezes the database terminates/closes. Or maybe a timer to be added to the freeze command to automatically unfreeze the database after the fact.

I am thinking this because I was told on a previous emailing when trying to build this utility totally from a script point of view using IJ to freeze the database, SH to perform the ZFS snapshot and IJ to unfreeze the database that it was not expected that the freeze/unfreeze would be done from separate connections. I fact I ran into a problem with the utility at that point where the IJ connection to unfreeze could not be created because the database was frozen.

So I guess is there ever a use case that would require a database to be frozen and not unfrozen before the connection is closed/lost?

Re: Had a problem with SYSCS_FREEZE_DATABASE and am wondering is something can be done make this more robust

Posted by Dag Wanvik <da...@oracle.com>.


On 10.09.2012 20:07, Bergquist, Brett wrote:
>
> From the stack traces of the Derby engine, it appears that something 
> causes the utility to fail after the database was frozen and neither 
> the shutdown hook nor the try/finally unfroze the database.   So after 
> that point, the database was effectively locked up.   The system was 
> still operating and connections were being made trying to access the  
> database exhausting all of the connections.
>

So if none of your unfreeze attempt worked here, what happened to that 
process/VM? You say "failed", did it hang, did it complete normally, 
albeit with no effect, or was it killed off?
Curious, since If we were to implement an automatic unfreeze at 
connection close, if the method you used didn't work, an automatic 
unfreeze might fail too (if there is a bug in the code that prevents 
unfreeze from doing its thing). If it hangs without unfreezing it might 
be interesting to see the VM state at that point, e.g. via jstack.

> So I was thinking that maybe the database engine should have some sort 
> of protection if this were to happen.   Maybe the database engine 
> should automatically unfreeze the database if the connection that 
> freezes the database terminates/closes.   Or maybe a timer to be added 
> to the freeze command to automatically unfreeze the database after the 
> fact.
>
> I am thinking this because I was told on a previous emailing when 
> trying to build this utility totally from a script point of view using 
> IJ to freeze the database, SH to perform the ZFS snapshot and IJ to 
> unfreeze the database that it was not expected that the 
> freeze/unfreeze would be done from separate connections.  I fact I ran 
> into a problem with the utility at that point where the IJ connection 
> to unfreeze could not be created because the database was frozen.
>
> So I guess is there ever a use case that would require a database to 
> be frozen and not unfrozen before the connection is closed/lost?
>