You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kathey Marsden (JIRA)" <ji...@apache.org> on 2008/11/19 19:06:44 UTC
[jira] Updated: (DERBY-637) Conglomerate does not exist after inserting large data volume

     [ https://issues.apache.org/jira/browse/DERBY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-637:
---------------------------------

    Attachment: noContainerBug.java

Here is the old repro.  It will need some work to run with Derby as compress table has changed, imports have changed etc. 3653 has some interesting comments regarding the "fix" which seemed to just reduce the window of opportunity for this bug to occur.  I don't know if things changed after 3653 or not. Below is the description and comments from the issue:

Description
An application forks 20 threads to update a table (insert or 
deletes depending on number of rows in the table).
When the number of rows falls to a low water mark, one thread 
will do 
    lock table x in exclusive mode
retryin until it succeeds, then 
    alter table x compress

The other threads are blocked trying to get read locks, part 
way through executing their plan.
Compress table near the end of its work invalidates plans on 
this table since the conglomerateId has changed for the 
underlying
store.    However the blocks threads are already using their 
invalid plans and when they get the lock get the error
    "Container {N} not found"

Notes:
 	

I am not sure if this problem is already documented.

I submitted a "fix" which reduces the problem but does not 
solve the known race problem with data dictionaries.
Instead of 14 errors we get 1 error now.

Person A wrote:

    The test first does "lock table datatypes exclusive mode" 
before starting
    the compress.   some of us thought if the compress had an 
excl lock it would

    maybe solve things.

    here is the problem.

    20 threads are running either inserting or deleting 
depending on how many
    rows there currently are in the table.   if we go too high 
we start deleting
.

    when we drop below a low water mark one thread does the 
"lock table excl" th
en
    alter table compress.    The other threads (19) are part 
way into executing
    their delete and block getting a write lock.  they are part 
way into their
    query plan, right?  bytecode or before that?

    compress eventually finishes, the Container and 
conglomerate id change,
    the plans were invalidated, the test commits, i assume the 
is lock released
    at commit.

    now some of the updater threads get the lock in turn and 
get the
    "Container {N} not found" error.  14 errors, not 19. why 
not all 19, don't k
now
    .
    then everyone must recompile because there are no more 
errors and we continu
e
    on.

    The question is, is there a way 
    to recompile once you get your lock but notice your plan is 
invalidated?
    is wait()/notify() used for the locks?   could  we wake 
them telling them
    to check their plan validation?

Person B replied:

    I think you have what is going on nailed, but I have no 
ideas how to
    fix it.  I think this is a known language issue, but still 
waiting on comment.

    I think it is too late to stop and retry, if I am not 
mistaken an
    arbitrary query could have already begun returning rows to 
the user
    when it encounters this error (maybe not this case - but a 
query with
    a complicated join may).

    It seems the "right" thing to do is to get locks on all 
tables in a plan
    up front before execution, and then check if the plan is 
valid.  I think
    this has been considered too major to do.

    No other ideas at this point other than getting a test 
case, logging a
    bug, and moving on.

Person C  replied:

    This is a classic race condition. The problem is that ALTER 
TABLE
    COMPRESS gets its exclusive lock near the beginning of its 
execution,
    but invalidates dependent plans near the end of its 
execution.
    We could either eliminate or narrow the window that allows 
the race
    condition by moving plan invalidation to the beginning of 
the
    execution of ALTER TABLE COMPRESS. We want it to be 
impossible or
    unlikely that an inserter or deleter can start executing 
with a
    conglomerate that's about to go away.

    Another possibility would be for the store to provide a way 
for
    the new conglomerate to have the same conglomerate id as 
the
    old conglomerate. The store would also have to take care of 
any
    open conglomerate controllers and scans that used the old
    conglomerate. I don't know the store well enough to say how 
hard
    this would be, but I'm guessing it would be very hard.

Person B  then replies:

    This would be very hard for store.  In all these cases of 
swapping out the
    container and conglomerate the id is the unit of recovery 
and using the
    "same" id for something that may have to be recovered is 
hard.

    Also the same type of problem can come about if an index 
exists on a
    table, and then is dropped.  If the plan tries to use the 
index after it
    has been dropped there is nothing the store can do in that 
case.

    moving the invalidation up seems like a good idea, but as 
Person C points
    out it doesn't solve it if there is any time when another 
thread can
    validate it's plan and then start executing, block on a 
lock and when
    it wakes up find the plan is invalid.


And i made the change to move the invalidate before we start 
moving rows
from old to the new table.

This helps the test, but does not solve the real problem.


Hope this helps.


> Conglomerate does not exist after inserting large data  volume
> --------------------------------------------------------------
>
>                 Key: DERBY-637
>                 URL: https://issues.apache.org/jira/browse/DERBY-637
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.1.6
>         Environment: Solaris 10 Sparc
> Sun 1.5 VM
> Client/server DB
> 1 GB page cache
> JVM heap on server: min 1 GB, max 3 GB 
>            Reporter: Øystein Grøvlen
>         Attachments: noContainerBug.java
>
>
> In a client/server environment I did as follows:
> 1. Started server
> 2. Dropped existing TPC-B tables and created new ones
> 3. Inserted data for 200 million accounts (30 GB account table)
> 4. When insertion was finished, tried to run a TPC-B transaction on same connection and was informed that conglomerate does not exist.  (See stack trace below).
> 5. Stopped client, started a new client to run a TPC-B transaction, got same error
> 6. Restarted server
> 7. Ran client again, and everything worked fine.
> Stack trace from derby.log:
> 2005-10-19 18:47:41.838 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.839 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Failed Statement is: UPDATE accounts SET abal = abal + ? WHERE aid = ? AND bid = ?
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:311)
> 	at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:224)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:486)
> 	at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:389)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1315)
> 	at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(B2IForwardScan.java:237)
> 	at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.java:750)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:530)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:1582)
> 	at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(DataDictionaryImpl.java:7218)
> 	at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescriptor(DataDictionaryImpl.java:5697)
> 	at org.apache.derby.impl.sql.compile.QueryTreeNode.resolveTableToSynonym(QueryTreeNode.java:1510)
> 	at org.apache.derby.impl.sql.compile.UpdateNode.bind(UpdateNode.java:207)
> 	at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:333)
> 	at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:107)
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:704)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:118)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(EmbedPreparedStatement20.java:82)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(EmbedPreparedStatement30.java:62)
> 	at org.apache.derby.jdbc.Driver30.newEmbedPreparedStatement(Driver30.java:92)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:678)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:575)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:585)
> 	at org.apache.derby.impl.drda.DRDAStatement.prepareStatementJDBC3(DRDAStatement.java:1497)
> 	at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java:486)
> 	at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatement.java:444)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnThread.java:3132)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:673)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:214)
> Cleanup action completed
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Failed Statement is: call SYSIBM.SQLCAMESSAGE(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> 	at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:311)
> 	at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:224)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:486)
> 	at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:389)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1315)
> 	at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(B2IForwardScan.java:237)
> 	at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.java:750)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:530)
> 	at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:1582)
> 	at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(DataDictionaryImpl.java:7218)
> 	at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescriptor(DataDictionaryImpl.java:5697)
> 	at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getRoutineList(DataDictionaryImpl.java:5766)
> 	at org.apache.derby.impl.sql.compile.StaticMethodCallNode.resolveRoutine(StaticMethodCallNode.java:303)
> 	at org.apache.derby.impl.sql.compile.StaticMethodCallNode.bindExpression(StaticMethodCallNode.java:192)
> 	at org.apache.derby.impl.sql.compile.JavaToSQLValueNode.bindExpression(JavaToSQLValueNode.java:250)
> 	at org.apache.derby.impl.sql.compile.CallStatementNode.bind(CallStatementNode.java:177)
> 	at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:333)
> 	at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:107)
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:704)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:118)
> 	at org.apache.derby.impl.jdbc.EmbedCallableStatement.<init>(EmbedCallableStatement.java:68)
> 	at org.apache.derby.impl.jdbc.EmbedCallableStatement20.<init>(EmbedCallableStatement20.java:78)
> 	at org.apache.derby.impl.jdbc.EmbedCallableStatement30.<init>(EmbedCallableStatement30.java:60)
> 	at org.apache.derby.jdbc.Driver30.newEmbedCallableStatement(Driver30.java:115)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnection.java:771)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnection.java:719)
> 	at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java:475)
> 	at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatement.java:444)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnThread.java:3132)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:673)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:214)
> Cleanup action completed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.