You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kathey Marsden (JIRA)" <ji...@apache.org> on 2008/11/19 19:06:44 UTC
[jira] Updated: (DERBY-637) Conglomerate does not exist after
inserting large data volume
[ https://issues.apache.org/jira/browse/DERBY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kathey Marsden updated DERBY-637:
---------------------------------
Attachment: noContainerBug.java
Here is the old repro. It will need some work to run with Derby as compress table has changed, imports have changed etc. 3653 has some interesting comments regarding the "fix" which seemed to just reduce the window of opportunity for this bug to occur. I don't know if things changed after 3653 or not. Below is the description and comments from the issue:
Description
An application forks 20 threads to update a table (insert or
deletes depending on number of rows in the table).
When the number of rows falls to a low water mark, one thread
will do
lock table x in exclusive mode
retryin until it succeeds, then
alter table x compress
The other threads are blocked trying to get read locks, part
way through executing their plan.
Compress table near the end of its work invalidates plans on
this table since the conglomerateId has changed for the
underlying
store. However the blocks threads are already using their
invalid plans and when they get the lock get the error
"Container {N} not found"
Notes:
I am not sure if this problem is already documented.
I submitted a "fix" which reduces the problem but does not
solve the known race problem with data dictionaries.
Instead of 14 errors we get 1 error now.
Person A wrote:
The test first does "lock table datatypes exclusive mode"
before starting
the compress. some of us thought if the compress had an
excl lock it would
maybe solve things.
here is the problem.
20 threads are running either inserting or deleting
depending on how many
rows there currently are in the table. if we go too high
we start deleting
.
when we drop below a low water mark one thread does the
"lock table excl" th
en
alter table compress. The other threads (19) are part
way into executing
their delete and block getting a write lock. they are part
way into their
query plan, right? bytecode or before that?
compress eventually finishes, the Container and
conglomerate id change,
the plans were invalidated, the test commits, i assume the
is lock released
at commit.
now some of the updater threads get the lock in turn and
get the
"Container {N} not found" error. 14 errors, not 19. why
not all 19, don't k
now
.
then everyone must recompile because there are no more
errors and we continu
e
on.
The question is, is there a way
to recompile once you get your lock but notice your plan is
invalidated?
is wait()/notify() used for the locks? could we wake
them telling them
to check their plan validation?
Person B replied:
I think you have what is going on nailed, but I have no
ideas how to
fix it. I think this is a known language issue, but still
waiting on comment.
I think it is too late to stop and retry, if I am not
mistaken an
arbitrary query could have already begun returning rows to
the user
when it encounters this error (maybe not this case - but a
query with
a complicated join may).
It seems the "right" thing to do is to get locks on all
tables in a plan
up front before execution, and then check if the plan is
valid. I think
this has been considered too major to do.
No other ideas at this point other than getting a test
case, logging a
bug, and moving on.
Person C replied:
This is a classic race condition. The problem is that ALTER
TABLE
COMPRESS gets its exclusive lock near the beginning of its
execution,
but invalidates dependent plans near the end of its
execution.
We could either eliminate or narrow the window that allows
the race
condition by moving plan invalidation to the beginning of
the
execution of ALTER TABLE COMPRESS. We want it to be
impossible or
unlikely that an inserter or deleter can start executing
with a
conglomerate that's about to go away.
Another possibility would be for the store to provide a way
for
the new conglomerate to have the same conglomerate id as
the
old conglomerate. The store would also have to take care of
any
open conglomerate controllers and scans that used the old
conglomerate. I don't know the store well enough to say how
hard
this would be, but I'm guessing it would be very hard.
Person B then replies:
This would be very hard for store. In all these cases of
swapping out the
container and conglomerate the id is the unit of recovery
and using the
"same" id for something that may have to be recovered is
hard.
Also the same type of problem can come about if an index
exists on a
table, and then is dropped. If the plan tries to use the
index after it
has been dropped there is nothing the store can do in that
case.
moving the invalidation up seems like a good idea, but as
Person C points
out it doesn't solve it if there is any time when another
thread can
validate it's plan and then start executing, block on a
lock and when
it wakes up find the plan is invalid.
And i made the change to move the invalidate before we start
moving rows
from old to the new table.
This helps the test, but does not solve the real problem.
Hope this helps.
> Conglomerate does not exist after inserting large data volume
> --------------------------------------------------------------
>
> Key: DERBY-637
> URL: https://issues.apache.org/jira/browse/DERBY-637
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.2.1.6
> Environment: Solaris 10 Sparc
> Sun 1.5 VM
> Client/server DB
> 1 GB page cache
> JVM heap on server: min 1 GB, max 3 GB
> Reporter: Øystein Grøvlen
> Attachments: noContainerBug.java
>
>
> In a client/server environment I did as follows:
> 1. Started server
> 2. Dropped existing TPC-B tables and created new ones
> 3. Inserted data for 200 million accounts (30 GB account table)
> 4. When insertion was finished, tried to run a TPC-B transaction on same connection and was informed that conglomerate does not exist. (See stack trace below).
> 5. Stopped client, started a new client to run a TPC-B transaction, got same error
> 6. Restarted server
> 7. Ran client again, and everything worked fine.
> Stack trace from derby.log:
> 2005-10-19 18:47:41.838 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.839 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Failed Statement is: UPDATE accounts SET abal = abal + ? WHERE aid = ? AND bid = ?
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:311)
> at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:224)
> at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:486)
> at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:389)
> at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1315)
> at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(B2IForwardScan.java:237)
> at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.java:750)
> at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:530)
> at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:1582)
> at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(DataDictionaryImpl.java:7218)
> at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescriptor(DataDictionaryImpl.java:5697)
> at org.apache.derby.impl.sql.compile.QueryTreeNode.resolveTableToSynonym(QueryTreeNode.java:1510)
> at org.apache.derby.impl.sql.compile.UpdateNode.bind(UpdateNode.java:207)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:333)
> at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:107)
> at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:704)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:118)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(EmbedPreparedStatement20.java:82)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(EmbedPreparedStatement30.java:62)
> at org.apache.derby.jdbc.Driver30.newEmbedPreparedStatement(Driver30.java:92)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:678)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:575)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.derby.impl.drda.DRDAStatement.prepareStatementJDBC3(DRDAStatement.java:1497)
> at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java:486)
> at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatement.java:444)
> at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnThread.java:3132)
> at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:673)
> at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:214)
> Cleanup action completed
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Cleanup action starting
> 2005-10-19 18:47:41.983 GMT Thread[DRDAConnThread_3,5,main] (XID = 75504654), (SESSIONID = 0), (DATABASE = /export/home3/tmp/oysteing/tpcbdb), (DRDAID = NF000001.OB77-578992897558106193{1}), Failed Statement is: call SYSIBM.SQLCAMESSAGE(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
> ERROR XSAI2: The conglomerate (8,048) requested does not exist.
> at org.apache.derby.iapi.error.StandardException.newException(StandardException.java:311)
> at org.apache.derby.impl.store.access.heap.HeapConglomerateFactory.readConglomerate(HeapConglomerateFactory.java:224)
> at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(RAMAccessManager.java:486)
> at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(RAMTransaction.java:389)
> at org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(RAMTransaction.java:1315)
> at org.apache.derby.impl.store.access.btree.index.B2IForwardScan.init(B2IForwardScan.java:237)
> at org.apache.derby.impl.store.access.btree.index.B2I.openScan(B2I.java:750)
> at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:530)
> at org.apache.derby.impl.store.access.RAMTransaction.openScan(RAMTransaction.java:1582)
> at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(DataDictionaryImpl.java:7218)
> at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getAliasDescriptor(DataDictionaryImpl.java:5697)
> at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getRoutineList(DataDictionaryImpl.java:5766)
> at org.apache.derby.impl.sql.compile.StaticMethodCallNode.resolveRoutine(StaticMethodCallNode.java:303)
> at org.apache.derby.impl.sql.compile.StaticMethodCallNode.bindExpression(StaticMethodCallNode.java:192)
> at org.apache.derby.impl.sql.compile.JavaToSQLValueNode.bindExpression(JavaToSQLValueNode.java:250)
> at org.apache.derby.impl.sql.compile.CallStatementNode.bind(CallStatementNode.java:177)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:333)
> at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:107)
> at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:704)
> at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:118)
> at org.apache.derby.impl.jdbc.EmbedCallableStatement.<init>(EmbedCallableStatement.java:68)
> at org.apache.derby.impl.jdbc.EmbedCallableStatement20.<init>(EmbedCallableStatement20.java:78)
> at org.apache.derby.impl.jdbc.EmbedCallableStatement30.<init>(EmbedCallableStatement30.java:60)
> at org.apache.derby.jdbc.Driver30.newEmbedCallableStatement(Driver30.java:115)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnection.java:771)
> at org.apache.derby.impl.jdbc.EmbedConnection.prepareCall(EmbedConnection.java:719)
> at org.apache.derby.impl.drda.DRDAStatement.prepare(DRDAStatement.java:475)
> at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(DRDAStatement.java:444)
> at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(DRDAConnThread.java:3132)
> at org.apache.derby.impl.drda.DRDAConnThread.processCommands(DRDAConnThread.java:673)
> at org.apache.derby.impl.drda.DRDAConnThread.run(DRDAConnThread.java:214)
> Cleanup action completed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.