You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Dag H. Wanvik (JIRA)" <ji...@apache.org> on 2007/04/18 02:05:15 UTC
[jira] Updated: (DERBY-1947) OutOfMemoryError after repeated calls to boot and shutdown a database

     [ https://issues.apache.org/jira/browse/DERBY-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-1947:
---------------------------------

    Attachment: ProcedureInTriggerTest-threadstacks.txt

I was getting ready to commit the patch for this issue, but then I
updated my trunk and found that the following newly
introduced/converted JUnit test makes Derby deadlock when the patch
(DERBY-1947-2) is active:

  ProcedureInTriggerTest

It looks like a deadlock in the VM (see also enclosed stacks of
involved threads) caused by the patch's use of finalizer code which
leads to synchronization, and I am at a loss presently for how to
resolve this best.

What seems to happen is this:
 
* In ProcedureInTriggerTest:388, a call to Stement#execute is performed:
   > s.execute("create trigger select_from_trig_table no cascade before " + 
                               "delete on t1 for each STATEMENT call proc_reads_sql(1)");

  This call synchronizes on the current (root) connection object.

* At this time I believe there are still two nested connection objects
  from previous statements (the test uses triggers and stored
  procedures called from them), which are no longer referenced but for
  whom the finalizer has yet to run (not yet garbage collected).

* Now, the CreateTriggerConstantAction#executeConstantAction in line
  275 uses the Dependency manager to send an invalidate to the trigger
  table so that DML statements on this table will be recompiled.  (see
  stack of thread "main" in attachment
  ProcedureInTriggerTest-threadstacks.txt).

* As a result of this, GenericPreparedStatement#prepareToInvalidate is
  called which again calls
  GenericLanguageConnectionContext#verifyNoOpenResultSets

* verifyNoOpenResultSets contains this sequence of calls:

  	// There may be open ResultSet's that are yet to be garbage collected
        // let's try and force these out rather than throw an error
        System.gc();
        System.runFinalization();

* This last call causes deadlock. This seems to happens because the
  said nested connection's finalizers are attempted executed.  With
  the patch, the finalizer of EmbedConnection contains a call to
  EmbedConnection#close which synchronizes on the root connection
  object (already locked by the "main" thread as part of the execute).
  The execution of the finalizer happens in a thread forked by the
  call to System.runFinalization(), which hangs waiting for the
  finalizer thread to complete (see Finalizer#forkSecondaryFinalizer's
  call to sft.join() ca line 115 in Sun JDK 1.6). Now, the finalizers
  are stuck waiting for the lock held my "main", so deadlock ensues.

* Now, without the call to System.runFinalization(); the test runs
  successfully, presumably because the two connection objects will
  then be garbage collected later, after s.execute() has terminated.

  Also, removing the call to System.gc() removes the deadlock, but
  this breaks the test however (verifyNoOpenResultSets throws X0X95.S
  - LANG_CANT_INVALIDATE_OPEN_RESULT_SET in line 1733):
    
     "Operation 'CREATE TRIGGER' cannot be performed on object 'T1'
      because there is an open ResultSet dependent on that object."

  So it would seem that calling System.gc() is sufficient, but
  System.runFinalization is harmful, since it will wait for
  finalization to finish, which is a dangerous thing if the thread
  already holds locks. *Or* one could argue that the finalizers should
  never call anything that could require a lock (too strict in my
  view).

So, now I am trying to understand what would be the correct approach
to resolve this.

- Without the call to close in the finalizer of EmbedConnection, we
  have DERBY-1947, that is, ContextManager objects is referenced from
  the HashSet 'allContexts' of the ContextService may leak.

- With the call to close in the finalizer of EmbedConnection, we risk
  deadlock as long as any Derby code calls System.runFinalization()
  anywhere. Grepping, i see this happens in three places in the
  engine:
        
  LowMemory.java:95: System.runFinalization();
  GenericLanguageConnectionContext.java:1617: System.runFinalization();
  GenericLanguageConnectionContext.java:1709: System.runFinalization();

Possible approaches:

1) Maybe we could change the HashSet 'allContexts' of the
   ContextService to use weak hash map perhaps, a WeakHashSet, if there
   is such a thing. That way, the ContextManager objects would be
   cleaned eventually.

2) Remove the explicit calls to System.runFinalization. It seems that
   keeping just the System.gc() call is less risky, but may be less
   efficient, so could break existing apps? The System.gc() is not
   guaranteed to yield results anyway..

3) Figure out a way to explicitly close the result sets in question,
   so the gc calls would be redundant. I am not sure how easy this
   would be; they are probably there for a good reason.

4) Remove the calls and allow the execution to fail as indicated in
   the comment of verifyNoOpenResultSets:

   // There may be open ResultSet's that are yet to be garbage collected
   // let's try and force these out rather than throw an error
   System.gc();
   System.runFinalization();
 
   This could break existing apps.

5) other ideas..? 

Finally, can this deadlock be considered a Java run-time error? If
not, what is the contract being broken by Derby app code in the above?
I tried to search a bit around advisability of explicitly running
runFinalization, in the presence of held Java locks, and finalizers
which can cause locks, but I didn't find anything conclusive...

Any advise is appreciated :)



> OutOfMemoryError after repeated calls to boot and shutdown a database
> ---------------------------------------------------------------------
>
>                 Key: DERBY-1947
>                 URL: https://issues.apache.org/jira/browse/DERBY-1947
>             Project: Derby
>          Issue Type: Bug
>          Components: Services
>    Affects Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6
>         Environment: Any
>            Reporter: Rajesh Kartha
>         Assigned To: Dag H. Wanvik
>         Attachments: DERBY-1947-1.diff, DERBY-1947-1.stat, DERBY-1947-2.diff, DERBY-1947-2.stat, HeapAnalysis_op.jpg, ProcedureInTriggerTest-threadstacks.txt, testEmbedAndClient.java
>
>
> I came across this OOM issue while running some system tests involving
> backup and restore against  Derby. The test is expected to run forever but
> using the default heap space it runs into OOM within 2 days. I earlier mentioned about this
> in my reply to the 10.2.1.6 vote - http://www.nabble.com/Re%3A--VOTE--10.2.1.6-release-p6650528.html
> Also there has been some discussions on the list on the related topic:
> http://issues.apache.org/jira/browse/DERBY-23 and
> http://www.nabble.com/question-about-shutdown-tf2300504.html
> Basic Analysis:
> --------------------
> Wrote a simple Java app (attached to this issue) that booted and shutdown the same 
> database multiple times. Depending on the heapsize the program ran into the
> OOM at some point, as expected. Some heap dump analysis using the IBM HeapAnalyzer 
> and revealed that the HashSet (allContexts) within org.apache.derby.iapi.services.context.ContextService 
> class seemed to be location of the leak (snapshot of the heapanalysis attached).
> A little bit of debugging shows that:
>  - for every:connection two ContextManager objects (say, cm1, cm2) are added to the HashSet
>  - for every shutdown a new ContextManager object (say, cm3) is added and two objects are removed
>  - the object removed are cm2 and cm3 - in that sequence
>  - but the object cm1 gets left behind
> This happens for every connect/shutdown sequence and since one of the ContextManager objects added to the 
> HashSet is not removed as a part of the cleanup, it contributes to growth in memory usage, hence
> an OOM eventually.
> For example:
> ============
> A 64M heap could allow about 1107 iterations of connect/shutdown only before running into OOM and 
> created 1108 un-used ContextManager objects in the memory.
> java -Xmx64M testEmbedAndClient
> ++++Debug: add() Size of allContexts HashSet obj= 1
> ----Debug: remove() Size of allContexts HashSet obj= 0
> ++++Debug: add() Size of allContexts HashSet obj= 1
> ----Debug: remove() Size of allContexts HashSet obj= 0
> ++++Debug: add() Size of allContexts HashSet obj= 1
> ++++Debug: add() Size of allContexts HashSet obj= 2
> ==== Database booted in embedded ====
> ++++Debug: add() Size of allContexts HashSet obj= 3
> ----Debug: remove() Size of allContexts HashSet obj= 2
> ----Debug: remove() Size of allContexts HashSet obj= 1
> ==== Shutdown complete in embedded ====
> ++++Debug: add() Size of allContexts HashSet obj= 2
> ++++Debug: add() Size of allContexts HashSet obj= 3
> ==== Database booted in embedded ====
> ++++Debug: add() Size of allContexts HashSet obj= 4
> ----Debug: remove() Size of allContexts HashSet obj= 3
> ----Debug: remove() Size of allContexts HashSet obj= 2
> ..
> ..
> ..
> ==== Database booted in embedded ====
> ++++Debug: add() Size of allContexts HashSet obj= 1109
> ----Debug: remove() Size of allContexts HashSet obj= 1108
> ----Debug: remove() Size of allContexts HashSet obj= 1107
> ==== Shutdown complete in embedded ====
> ++++Debug: add() Size of allContexts HashSet obj= 1108
> ++++Debug: add() Size of allContexts HashSet obj= 1109
> ----Debug: remove() Size of allContexts HashSet obj= 1108
> java.sql.SQLException: Failed to start database 'testdb', see the next exception
>  for details.
>         at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:89)
>         at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:95)
>         at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:174)
>         at org.apache.derby.impl.jdbc.EmbedConnection.newSQLException(EmbedConnection.java:1985)
>         at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(EmbedConnection.java:1649)
>         at org.apache.derby.impl.jdbc.EmbedConnection.<init>(EmbedConnection.java:223)
>         at org.apache.derby.impl.jdbc.EmbedConnection30.<init>(EmbedConnection30.java:73)
>         at org.apache.derby.jdbc.Driver30.getNewEmbedConnection(Driver30.java:74)
>         at org.apache.derby.jdbc.InternalDriver.connect(InternalDriver.java:210)
> ----Debug: remove() Size of allContexts HashSet obj= 1107
>         at org.apache.derby.jdbc.AutoloadedDriver.connect(AutoloadedDriver.java:117)
>         at java.sql.DriverManager.getConnection(DriverManager.java:525)
>         at java.sql.DriverManager.getConnection(DriverManager.java:193)
>         at testEmbedAndClient.testInEmbedded(testEmbedAndClient.java:40)
>         at testEmbedAndClient.main(testEmbedAndClient.java:19)
> java.lang.OutOfMemoryError: Java heap space
> OOM happened after 1107 iterations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.