You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Nathan Jensen <nd...@gmail.com> on 2022/09/28 21:12:43 UTC

deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
have a continuous data flow into Qpid so each message gets persisted to the
Derby message store as data comes in and then it is deleted from the Derby
message store as data is picked up and processed.  The data flow is pretty
constant, there are peaks and valleys but basically data is always coming
in so there are LOTS of inserts and deletes.  Over time we found that the
Derby message store fills up the disk drive until it runs out of space and
in general the workaround to that has been to stop Qpid and remove the
message store (i.e. the Derby directory), which loses data.

My attempted solution to this was to extend Qpid's DerbyMessageStore with a
DerbyRepackingMessageStore that executes the
SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
test system with flowing data, this always has a deadlock exception and
some data (the aborted transaction) is lost.  An example is:

Caused by: org.apache.derby.shared.common.error.StandardException: A lock
could not be obtained due to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
  Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
message_id , meta_data ) values (?, ?)
  Granted XID : {501214829, X}
Lock : ROW, SYSCONGLOMERATES, (7,8)
  Waiting XID : {501214829, X} , APP, alter table
"APP"."QPID_MESSAGE_METADATA" compress
  Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
. The selected victim is XID : 501214885.
        at
org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
        at
org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
        at
org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
        at
org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)

My understanding of that is that the deadlock is occurring because the
insert has a shared lock on the sysconglomerates row and wants an exclusive
lock on the qpid_message_metadata table, while the compress has an
exclusive lock on the qpid_message_metadata table and wants an exclusive
lock on the sysconglomerates row, so they block each other.

I found this thread from May of 2009 which had a similar issue:
https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
expect the compress command to go for the lock(s), wait until it gets the
lock(s), then block the other threads and do the compress.  But it
deadlocks instead.

Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
tried using a non-autocommit connection and locking the table exclusively
before calling compress and then commit, but that got the same errors.
Based on this thread,
https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
the in place compress and it succeeds without deadlock.  However, it is
significantly slower, too slow to be a workable solution for my use case
and it also doesn't return as much space to the operating system.

Does this look like a Derby bug where the compress command will deadlock if
there are inserts happening at roughly the same time?  Or is this
expected?  Do both the inserts and compress need a sysconglomerates row
lock?  Is there a way the compress command could get both its locks at the
same time to avoid the deadlock?

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Bryan Pendleton <bp...@gmail.com>.
A wild guess: if the INSERT transaction doesn't commit immediately,
but rather inserts multiple rows in a group before committing them
all, then maybe the transaction gets too large, and then once there
have been enough row locks taken by inserting the individual rows, the
database is switching from row locking to table locking, and that's
what causes the deadlock.

As an experiment, try committing the INSERT transaction more
frequently, even (as an extreme) committing after every row inserted,
and see if that avoids the deadlock.

Switching the discussion slightly, one technique that people often use
instead of trying to re-use the same table by deleting rows and
compressing in a way that keeps pace with the insert load, is to use
multiple tables, and have the insert switch to an alternate table
after some time, then the cleaning program can delete the old rows and
compress, and then the two tasks and flip back.

So maybe you could enhance QPID's DerbyMessageStore so that it uses a
collection of tables, not just a single table, so that it can arrange
to have the compress work operating on a different table than the
insert work.

bryan

On Thu, Sep 29, 2022 at 11:28 AM Rick Hillegas <ri...@gmail.com> wrote:
>
> I'm not an expert on the table compression code, but it does need an overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
>
> If I'm interpreting the error correctly, the table compressor is blocked trying to escalate its lock on the SYSCONGLOMERATES row from shared to exclusive. But it can't do this because the INSERT transaction holds a shared lock on the SYSCONGLOMERATES row. I don't understand why the INSERT transaction would need to lock a SYSCONGLOMERATES row. The following simple experiment shows that, by itself, a simple INSERT should not lock any rows in SYSCONGLOMERATES:
>
> ij version 10.17
>
> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>
> ij> AUTOCOMMIT OFF;
>
> ij> CREATE TABLE t(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE INDEX t_idx ON t(a);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE TABLE s(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> INSERT INTO s VALUES (1);
>
> 1 row inserted/updated/deleted
>
> ij> COMMIT;
>
> ij> INSERT INTO t SELECT * FROM s;
>
> 1 row inserted/updated/deleted
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 186            |TABLE|IX  |T                                                                                                                               |Tablelock           |GRANT|T        |2    |NULL
>
> 186            |ROW  |X   |T                                                                                                                               |(1,7)               |GRANT|T        |1    |NULL
>
> 2 rows selected
>
> ij> COMMIT;
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 0 rows selected
>
>
> Is there anything else going on in the INSERT transaction? Can you share the table DDL with us, including indexes, constraints, auto-generated columns, etc.?
>
> -Rick
>
>
> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>
> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
> have a continuous data flow into Qpid so each message gets persisted to the
> Derby message store as data comes in and then it is deleted from the Derby
> message store as data is picked up and processed.  The data flow is pretty
> constant, there are peaks and valleys but basically data is always coming
> in so there are LOTS of inserts and deletes.  Over time we found that the
> Derby message store fills up the disk drive until it runs out of space and
> in general the workaround to that has been to stop Qpid and remove the
> message store (i.e. the Derby directory), which loses data.
>
> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
> DerbyRepackingMessageStore that executes the
> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
> test system with flowing data, this always has a deadlock exception and
> some data (the aborted transaction) is lost.  An example is:
>
> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>   Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>   Granted XID : {501214829, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>   Waiting XID : {501214829, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>   Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> . The selected victim is XID : 501214885.
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>         at
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>         at
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>
> My understanding of that is that the deadlock is occurring because the
> insert has a shared lock on the sysconglomerates row and wants an exclusive
> lock on the qpid_message_metadata table, while the compress has an
> exclusive lock on the qpid_message_metadata table and wants an exclusive
> lock on the sysconglomerates row, so they block each other.
>
> I found this thread from May of 2009 which had a similar issue:
> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
> expect the compress command to go for the lock(s), wait until it gets the
> lock(s), then block the other threads and do the compress.  But it
> deadlocks instead.
>
> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
> tried using a non-autocommit connection and locking the table exclusively
> before calling compress and then commit, but that got the same errors.
> Based on this thread,
> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> the in place compress and it succeeds without deadlock.  However, it is
> significantly slower, too slow to be a workable solution for my use case
> and it also doesn't return as much space to the operating system.
>
> Does this look like a Derby bug where the compress command will deadlock if
> there are inserts happening at roughly the same time?  Or is this
> expected?  Do both the inserts and compress need a sysconglomerates row
> lock?  Is there a way the compress command could get both its locks at the
> same time to avoid the deadlock?
>
>

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Rick Hillegas <ri...@gmail.com>.
Thanks for instrumenting your program to print out the lock table just 
before the compression command.

I'm afraid that the lock table printout doesn't shed much light on the 
problem: I don't see the lock which the INSERT transaction holds on the 
SYSCONGLOMERATES row at compression time.

Bryan makes a number of good suggestions for working around this deadlock.


On 9/30/22 2:16 PM, Nathan Jensen wrote:
> Ok, below is the log output for the first two compresses.  Running select
> right before the first compress showed no locks so I included the output
> for the second compress as that was more interesting, though the
> transaction ids don't match.  I removed a bunch of normal qpid data flow
> logging stuff to shorten the output for clarity, and I formatted the locks
> printout to hopefully be more readable.
>
> -------------------------------------
>
> 2022-09-30T20:48:32,186Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - connection.getAutoCommit() is false
> 2022-09-30T20:48:32,186Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
> 2022-09-30T20:48:32,259Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:
>
> 2022-09-30T20:48:32,260Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_queue_entries
> 2022-09-30T20:48:52,554Z ERROR [IO-/147.18.139.60:45964]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Failed to enqueue messages
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due
> to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_QUEUE_ENTRIES, Tablelock
>    Waiting XID : {627716261, IX} , APP, INSERT INTO QPID_QUEUE_ENTRIES
> (queue_id, message_id) values (?,?)
>    Granted XID : {627715588, X}
> Lock : ROW, SYSCONGLOMERATES, (7,6)
>    Waiting XID : {627715588, X} , APP, alter table
> "APP"."QPID_QUEUE_ENTRIES" compress
>    Granted XID : {627715588, S} , {627716262, S} , {627716261, S}
> . The selected victim is XID : 627716261.
> at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
> at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
> at
> org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
> at
> org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.enqueueMessages(AbstractJDBCMessageStore.java:684)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$700(AbstractJDBCMessageStore.java:81)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$new$0(AbstractJDBCMessageStore.java:1226)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
> at
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
> at
> org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)
>
> 2022-09-30T20:49:12,632Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
> 2022-09-30T20:49:12,644Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:
>       627716294     TABLE     IX     QPID_MESSAGE_CONTENT      Tablelock
> GRANT     T     2     null
>       627716294     ROW        X     QPID_MESSAGE_METADATA     (118,8)
> GRANT     T     1     null
>       627716294     ROW        X     QPID_QUEUE_ENTRIES        (96,32)
> GRANT     T     1     null
>       627716294     TABLE     IX     QPID_QUEUE_ENTRIES        Tablelock
> GRANT     T     2     null
>       627716285     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
> GRANT     T     2     null
>       627716294     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
> GRANT     T     2     null
>       627716285     ROW        X     QPID_MESSAGE_METADATA     (150,13)
>   GRANT     T     2     null
>       627716294     ROW        X     QPID_MESSAGE_CONTENT      (81,85)
> GRANT     T     1     null
> 2022-09-30T20:49:12,644Z INFO  [RepackTimer]
> (o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_message_metadata
>
> 2022-09-30T20:49:32,682Z ERROR [IO-/147.18.139.173:44710]
> (o.a.q.s.u.ServerScopedRuntimeException) - Exception on enqueuing message
> into message store25904045
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due
> to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>    Waiting XID : {627716335, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>    Granted XID : {627716296, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>    Waiting XID : {627716296, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>    Granted XID : {627716296, S} , {627716335, S}
> . The selected victim is XID : 627716335.
> at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
> at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
> at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
> at
> org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
> at
> org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
> at
> org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
> at
> org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.storeMetaData(AbstractJDBCMessageStore.java:981)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$1200(AbstractJDBCMessageStore.java:81)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$StoredJDBCMessage.store(AbstractJDBCMessageStore.java:1635)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$enqueueMessage$1(AbstractJDBCMessageStore.java:1240)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
> at
> org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
> at
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
> at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
>
> On Fri, Sep 30, 2022 at 2:46 PM Rick Hillegas <ri...@gmail.com>
> wrote:
>
>> I don't have any theories yet about what is going on. Can you try
>> issuing the following statement just before you compress the table:
>>
>> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>>
>> I don't understand why the INSERT transaction grabs a lock on a
>> SYSCONGLOMERATES row.
>>
>> Thanks,
>> -Rick
>>
>>
>> On 9/29/22 1:24 PM, Nathan Jensen wrote:
>>> Thanks for the response!  To clarify a few things, we're using Derby
>>> 10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
>>> being so I can't upgrade to Derby 10.16 at this time.  The version of
>> Qpid
>>> we're using at present is 7.1.12.
>>>
>>> I don't see anything suspicious regarding the INSERT statement, but maybe
>>> you could see something that I don't.  I believe the relevant code is
>> here:
>> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
>>> That does not commit the transaction, the transaction is committed
>>> elsewhere in the same file, here:
>>>
>> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
>>> I should also note that one time I saw the deadlock happen with a DELETE
>>> statement instead of an INSERT, so I don't think the unexpected row
>> locking
>>> in SYSCONGLOMERATES is limited to INSERT.
>>>
>>> Here is the DDL as produced by Derby's dblook:
>>> -- ----------------------------------------------
>>> -- DDL Statements for tables
>>> -- ----------------------------------------------
>>>
>>> CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
>>> "MESSAGE_ID" BIGINT NOT NULL);
>>>
>>> CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
>>> "CONTENT" BLOB(2147483647));
>>>
>>> CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
>>> "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
>>> FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
>>> VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);
>>>
>>> CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
>>> "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
>>> "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET"
>> BLOB(2147483647));
>>> CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
>>> "META_DATA" BLOB(2147483647));
>>>
>>> CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);
>>>
>>> CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
>>> VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
>>> NOT NULL);
>>>
>>> CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
>>> NULL, "VERSION_TIME" TIMESTAMP);
>>>
>>> -- ----------------------------------------------
>>> -- DDL Statements for keys
>>> -- ----------------------------------------------
>>>
>>> -- PRIMARY/UNIQUE
>>> ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
>>> "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
>>> ("LINK_KEY");
>>>
>>> ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
>>> "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
>>> ("MESSAGE_ID");
>>>
>>> ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
>>> "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
>>> ("QUEUE_ID", "MESSAGE_ID");
>>>
>>> ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
>>> "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY
>> ("FORMAT",
>>> "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");
>>>
>>> ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
>>> "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY
>> ("FORMAT",
>>> "GLOBAL_ID", "BRANCH_ID");
>>>
>>> ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
>>> "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
>>> ("MESSAGE_ID");
>>>
>>> ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
>>> "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
>>> ("VERSION");
>>>
>>>
>>> And just in case I'm doing something stupid, here is the current version
>> of
>>> my code.  conn.getAutoCommit() is false, but I also tried it with an
>>> autocommit connection and without the commit statements that time.  The
>>> code is inside a try-catch-finally that is inside a TimerTask.
>>>
>>> conn = newConnection();
>>> LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
>>> LOGGER.info("Compressing qpid_queue_entries");
>>> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
>>> 'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
>>> conn.commit();
>>> LOGGER.info("Compressing qpid_message_metadata");
>>> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
>>> 'QPID_MESSAGE_METADATA', 0)").executeUpdate();
>>> conn.commit();
>>> LOGGER.info("Compressing qpid_message_content");
>>> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
>>> 'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
>>> conn.commit();
>>>
>>> Thanks for any help you can provide!
>>>
>>> On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <ri...@gmail.com>
>>> wrote:
>>>
>>>> I'm not an expert on the table compression code, but it does need an
>>>> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
>>>>
>>>> If I'm interpreting the error correctly, the table compressor is blocked
>>>> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
>>>> exclusive. But it can't do this because the INSERT transaction holds a
>>>> shared lock on the SYSCONGLOMERATES row. I don't understand why the
>> INSERT
>>>> transaction would need to lock a SYSCONGLOMERATES row. The following
>> simple
>>>> experiment shows that, by itself, a simple INSERT should not lock any
>> rows
>>>> in SYSCONGLOMERATES:
>>>>
>>>> ij version 10.17
>>>>
>>>> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>>>>
>>>> ij> AUTOCOMMIT OFF;
>>>>
>>>> ij> CREATE TABLE t(a INT);
>>>>
>>>> 0 rows inserted/updated/deleted
>>>>
>>>> ij> CREATE INDEX t_idx ON t(a);
>>>>
>>>> 0 rows inserted/updated/deleted
>>>>
>>>> ij> CREATE TABLE s(a INT);
>>>>
>>>> 0 rows inserted/updated/deleted
>>>>
>>>> ij> INSERT INTO s VALUES (1);
>>>>
>>>> 1 row inserted/updated/deleted
>>>>
>>>> ij> COMMIT;
>>>>
>>>> ij> INSERT INTO t SELECT * FROM s;
>>>>
>>>> 1 row inserted/updated/deleted
>>>>
>>>> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>>>>
>>>> XID            |TYPE |MODE|TABLENAME
>>         |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>>>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> 186            |TABLE|IX  |T
>>         |Tablelock           |GRANT|T        |2    |NULL
>>>> 186            |ROW  |X   |T
>>         |(1,7)               |GRANT|T        |1    |NULL
>>>> 2 rows selected
>>>>
>>>> ij> COMMIT;
>>>>
>>>> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>>>>
>>>> XID            |TYPE |MODE|TABLENAME
>>         |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>>>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> 0 rows selected
>>>>
>>>>
>>>> Is there anything else going on in the INSERT transaction? Can you share
>>>> the table DDL with us, including indexes, constraints, auto-generated
>>>> columns, etc.?
>>>>
>>>> -Rick
>>>>
>>>>
>>>> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>>>>
>>>> Hi, we are using Derby as the backing message store for Qpid-Broker-J.
>> We
>>>> have a continuous data flow into Qpid so each message gets persisted to
>> the
>>>> Derby message store as data comes in and then it is deleted from the
>> Derby
>>>> message store as data is picked up and processed.  The data flow is
>> pretty
>>>> constant, there are peaks and valleys but basically data is always
>> coming
>>>> in so there are LOTS of inserts and deletes.  Over time we found that
>> the
>>>> Derby message store fills up the disk drive until it runs out of space
>> and
>>>> in general the workaround to that has been to stop Qpid and remove the
>>>> message store (i.e. the Derby directory), which loses data.
>>>>
>>>> My attempted solution to this was to extend Qpid's DerbyMessageStore
>> with a
>>>> DerbyRepackingMessageStore that executes the
>>>> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However,
>> on a
>>>> test system with flowing data, this always has a deadlock exception and
>>>> some data (the aborted transaction) is lost.  An example is:
>>>>
>>>> Caused by: org.apache.derby.shared.common.error.StandardException: A
>> lock
>>>> could not be obtained due to a deadlock, cycle of locks and waiters is:
>>>> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>>>>     Waiting XID : {501214885, IX} , APP, INSERT INTO
>> QPID_MESSAGE_METADATA(
>>>> message_id , meta_data ) values (?, ?)
>>>>     Granted XID : {501214829, X}
>>>> Lock : ROW, SYSCONGLOMERATES, (7,8)
>>>>     Waiting XID : {501214829, X} , APP, alter table
>>>> "APP"."QPID_MESSAGE_METADATA" compress
>>>>     Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
>>>> . The selected victim is XID : 501214885.
>>>>           at
>>>>
>> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>>>>           at
>>>>
>> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>>>>           at
>>>>
>> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>>>>           at
>>>>
>> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>>>> My understanding of that is that the deadlock is occurring because the
>>>> insert has a shared lock on the sysconglomerates row and wants an
>> exclusive
>>>> lock on the qpid_message_metadata table, while the compress has an
>>>> exclusive lock on the qpid_message_metadata table and wants an exclusive
>>>> lock on the sysconglomerates row, so they block each other.
>>>>
>>>> I found this thread from May of 2009 which had a similar issue:
>> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
>>>> expect the compress command to go for the lock(s), wait until it gets
>> the
>>>> lock(s), then block the other threads and do the compress.  But it
>>>> deadlocks instead.
>>>>
>>>> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683,
>> I
>>>> tried using a non-autocommit connection and locking the table
>> exclusively
>>>> before calling compress and then commit, but that got the same errors.
>>>> Based on this thread,
>> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
>>>> the in place compress and it succeeds without deadlock.  However, it is
>>>> significantly slower, too slow to be a workable solution for my use case
>>>> and it also doesn't return as much space to the operating system.
>>>>
>>>> Does this look like a Derby bug where the compress command will
>> deadlock if
>>>> there are inserts happening at roughly the same time?  Or is this
>>>> expected?  Do both the inserts and compress need a sysconglomerates row
>>>> lock?  Is there a way the compress command could get both its locks at
>> the
>>>> same time to avoid the deadlock?
>>>>
>>>>
>>>>
>>>>
>>


Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Nathan Jensen <nd...@gmail.com>.
Ok, below is the log output for the first two compresses.  Running select
right before the first compress showed no locks so I included the output
for the second compress as that was more interesting, though the
transaction ids don't match.  I removed a bunch of normal qpid data flow
logging stuff to shorten the output for clarity, and I formatted the locks
printout to hopefully be more readable.

-------------------------------------

2022-09-30T20:48:32,186Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - connection.getAutoCommit() is false
2022-09-30T20:48:32,186Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
2022-09-30T20:48:32,259Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:

2022-09-30T20:48:32,260Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_queue_entries
2022-09-30T20:48:52,554Z ERROR [IO-/147.18.139.60:45964]
(o.a.q.s.d.DerbyRepackingMessageStore) - Failed to enqueue messages
java.sql.SQLTransactionRollbackException: A lock could not be obtained due
to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_QUEUE_ENTRIES, Tablelock
  Waiting XID : {627716261, IX} , APP, INSERT INTO QPID_QUEUE_ENTRIES
(queue_id, message_id) values (?,?)
  Granted XID : {627715588, X}
Lock : ROW, SYSCONGLOMERATES, (7,6)
  Waiting XID : {627715588, X} , APP, alter table
"APP"."QPID_QUEUE_ENTRIES" compress
  Granted XID : {627715588, S} , {627716262, S} , {627716261, S}
. The selected victim is XID : 627716261.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
at
org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
at
org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
at
org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.enqueueMessages(AbstractJDBCMessageStore.java:684)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$700(AbstractJDBCMessageStore.java:81)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$new$0(AbstractJDBCMessageStore.java:1226)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
at
org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)
at
org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117)

2022-09-30T20:49:12,632Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Running select from lock table
2022-09-30T20:49:12,644Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Locks are:
     627716294     TABLE     IX     QPID_MESSAGE_CONTENT      Tablelock
GRANT     T     2     null
     627716294     ROW        X     QPID_MESSAGE_METADATA     (118,8)
GRANT     T     1     null
     627716294     ROW        X     QPID_QUEUE_ENTRIES        (96,32)
GRANT     T     1     null
     627716294     TABLE     IX     QPID_QUEUE_ENTRIES        Tablelock
GRANT     T     2     null
     627716285     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
GRANT     T     2     null
     627716294     TABLE     IX     QPID_MESSAGE_METADATA     Tablelock
GRANT     T     2     null
     627716285     ROW        X     QPID_MESSAGE_METADATA     (150,13)
 GRANT     T     2     null
     627716294     ROW        X     QPID_MESSAGE_CONTENT      (81,85)
GRANT     T     1     null
2022-09-30T20:49:12,644Z INFO  [RepackTimer]
(o.a.q.s.d.DerbyRepackingMessageStore) - Compressing qpid_message_metadata

2022-09-30T20:49:32,682Z ERROR [IO-/147.18.139.173:44710]
(o.a.q.s.u.ServerScopedRuntimeException) - Exception on enqueuing message
into message store25904045
java.sql.SQLTransactionRollbackException: A lock could not be obtained due
to a deadlock, cycle of locks and waiters is:
Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
  Waiting XID : {627716335, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
message_id , meta_data ) values (?, ?)
  Granted XID : {627716296, X}
Lock : ROW, SYSCONGLOMERATES, (7,8)
  Waiting XID : {627716296, X} , APP, alter table
"APP"."QPID_MESSAGE_METADATA" compress
  Granted XID : {627716296, S} , {627716335, S}
. The selected victim is XID : 627716335.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:100)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Util.java:230)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(TransactionResourceImpl.java:431)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(TransactionResourceImpl.java:360)
at
org.apache.derby.impl.jdbc.EmbedConnection.handleException(EmbedConnection.java:2405)
at
org.apache.derby.impl.jdbc.ConnectionChild.handleException(ConnectionChild.java:88)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(EmbedPreparedStatement.java:152)
at
org.apache.derby.impl.jdbc.EmbedPreparedStatement42.<init>(EmbedPreparedStatement42.java:41)
at
org.apache.derby.iapi.jdbc.Driver42.newEmbedPreparedStatement(Driver42.java:56)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1740)
at
org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(EmbedConnection.java:1568)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.storeMetaData(AbstractJDBCMessageStore.java:981)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore.access$1200(AbstractJDBCMessageStore.java:81)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$StoredJDBCMessage.store(AbstractJDBCMessageStore.java:1635)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.lambda$enqueueMessage$1(AbstractJDBCMessageStore.java:1240)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.doPreCommitActions(AbstractJDBCMessageStore.java:1289)
at
org.apache.qpid.server.store.jdbc.AbstractJDBCMessageStore$JDBCTransaction.commitTranAsync(AbstractJDBCMessageStore.java:1278)
at
org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294)
at org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124)

On Fri, Sep 30, 2022 at 2:46 PM Rick Hillegas <ri...@gmail.com>
wrote:

> I don't have any theories yet about what is going on. Can you try
> issuing the following statement just before you compress the table:
>
> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> I don't understand why the INSERT transaction grabs a lock on a
> SYSCONGLOMERATES row.
>
> Thanks,
> -Rick
>
>
> On 9/29/22 1:24 PM, Nathan Jensen wrote:
> > Thanks for the response!  To clarify a few things, we're using Derby
> > 10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
> > being so I can't upgrade to Derby 10.16 at this time.  The version of
> Qpid
> > we're using at present is 7.1.12.
> >
> > I don't see anything suspicious regarding the INSERT statement, but maybe
> > you could see something that I don't.  I believe the relevant code is
> here:
> >
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
> > That does not commit the transaction, the transaction is committed
> > elsewhere in the same file, here:
> >
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
> > I should also note that one time I saw the deadlock happen with a DELETE
> > statement instead of an INSERT, so I don't think the unexpected row
> locking
> > in SYSCONGLOMERATES is limited to INSERT.
> >
> > Here is the DDL as produced by Derby's dblook:
> > -- ----------------------------------------------
> > -- DDL Statements for tables
> > -- ----------------------------------------------
> >
> > CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
> > "MESSAGE_ID" BIGINT NOT NULL);
> >
> > CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
> > "CONTENT" BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
> > "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
> > FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
> > VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);
> >
> > CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
> > "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
> > "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET"
> BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
> > "META_DATA" BLOB(2147483647));
> >
> > CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);
> >
> > CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
> > VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
> > NOT NULL);
> >
> > CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
> > NULL, "VERSION_TIME" TIMESTAMP);
> >
> > -- ----------------------------------------------
> > -- DDL Statements for keys
> > -- ----------------------------------------------
> >
> > -- PRIMARY/UNIQUE
> > ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
> > "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("LINK_KEY");
> >
> > ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
> > "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
> > "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("QUEUE_ID", "MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
> > "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("FORMAT",
> > "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");
> >
> > ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
> > "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("FORMAT",
> > "GLOBAL_ID", "BRANCH_ID");
> >
> > ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
> > "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("MESSAGE_ID");
> >
> > ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
> > "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
> > ("VERSION");
> >
> >
> > And just in case I'm doing something stupid, here is the current version
> of
> > my code.  conn.getAutoCommit() is false, but I also tried it with an
> > autocommit connection and without the commit statements that time.  The
> > code is inside a try-catch-finally that is inside a TimerTask.
> >
> > conn = newConnection();
> > LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
> > LOGGER.info("Compressing qpid_queue_entries");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
> > conn.commit();
> > LOGGER.info("Compressing qpid_message_metadata");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_MESSAGE_METADATA', 0)").executeUpdate();
> > conn.commit();
> > LOGGER.info("Compressing qpid_message_content");
> > conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> > 'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
> > conn.commit();
> >
> > Thanks for any help you can provide!
> >
> > On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <ri...@gmail.com>
> > wrote:
> >
> >> I'm not an expert on the table compression code, but it does need an
> >> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
> >>
> >> If I'm interpreting the error correctly, the table compressor is blocked
> >> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
> >> exclusive. But it can't do this because the INSERT transaction holds a
> >> shared lock on the SYSCONGLOMERATES row. I don't understand why the
> INSERT
> >> transaction would need to lock a SYSCONGLOMERATES row. The following
> simple
> >> experiment shows that, by itself, a simple INSERT should not lock any
> rows
> >> in SYSCONGLOMERATES:
> >>
> >> ij version 10.17
> >>
> >> ij> CONNECT 'jdbc:derby:memory:db;create=true';
> >>
> >> ij> AUTOCOMMIT OFF;
> >>
> >> ij> CREATE TABLE t(a INT);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> CREATE INDEX t_idx ON t(a);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> CREATE TABLE s(a INT);
> >>
> >> 0 rows inserted/updated/deleted
> >>
> >> ij> INSERT INTO s VALUES (1);
> >>
> >> 1 row inserted/updated/deleted
> >>
> >> ij> COMMIT;
> >>
> >> ij> INSERT INTO t SELECT * FROM s;
> >>
> >> 1 row inserted/updated/deleted
> >>
> >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
> >>
> >> XID            |TYPE |MODE|TABLENAME
>
>        |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> 186            |TABLE|IX  |T
>
>        |Tablelock           |GRANT|T        |2    |NULL
> >>
> >> 186            |ROW  |X   |T
>
>        |(1,7)               |GRANT|T        |1    |NULL
> >>
> >> 2 rows selected
> >>
> >> ij> COMMIT;
> >>
> >> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
> >>
> >> XID            |TYPE |MODE|TABLENAME
>
>        |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> 0 rows selected
> >>
> >>
> >> Is there anything else going on in the INSERT transaction? Can you share
> >> the table DDL with us, including indexes, constraints, auto-generated
> >> columns, etc.?
> >>
> >> -Rick
> >>
> >>
> >> On 9/28/22 2:12 PM, Nathan Jensen wrote:
> >>
> >> Hi, we are using Derby as the backing message store for Qpid-Broker-J.
> We
> >> have a continuous data flow into Qpid so each message gets persisted to
> the
> >> Derby message store as data comes in and then it is deleted from the
> Derby
> >> message store as data is picked up and processed.  The data flow is
> pretty
> >> constant, there are peaks and valleys but basically data is always
> coming
> >> in so there are LOTS of inserts and deletes.  Over time we found that
> the
> >> Derby message store fills up the disk drive until it runs out of space
> and
> >> in general the workaround to that has been to stop Qpid and remove the
> >> message store (i.e. the Derby directory), which loses data.
> >>
> >> My attempted solution to this was to extend Qpid's DerbyMessageStore
> with a
> >> DerbyRepackingMessageStore that executes the
> >> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However,
> on a
> >> test system with flowing data, this always has a deadlock exception and
> >> some data (the aborted transaction) is lost.  An example is:
> >>
> >> Caused by: org.apache.derby.shared.common.error.StandardException: A
> lock
> >> could not be obtained due to a deadlock, cycle of locks and waiters is:
> >> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
> >>    Waiting XID : {501214885, IX} , APP, INSERT INTO
> QPID_MESSAGE_METADATA(
> >> message_id , meta_data ) values (?, ?)
> >>    Granted XID : {501214829, X}
> >> Lock : ROW, SYSCONGLOMERATES, (7,8)
> >>    Waiting XID : {501214829, X} , APP, alter table
> >> "APP"."QPID_MESSAGE_METADATA" compress
> >>    Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> >> . The selected victim is XID : 501214885.
> >>          at
> >>
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
> >>          at
> >>
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
> >>          at
> >>
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
> >>          at
> >>
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
> >>
> >> My understanding of that is that the deadlock is occurring because the
> >> insert has a shared lock on the sysconglomerates row and wants an
> exclusive
> >> lock on the qpid_message_metadata table, while the compress has an
> >> exclusive lock on the qpid_message_metadata table and wants an exclusive
> >> lock on the sysconglomerates row, so they block each other.
> >>
> >> I found this thread from May of 2009 which had a similar issue:
> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
> >> expect the compress command to go for the lock(s), wait until it gets
> the
> >> lock(s), then block the other threads and do the compress.  But it
> >> deadlocks instead.
> >>
> >> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683,
> I
> >> tried using a non-autocommit connection and locking the table
> exclusively
> >> before calling compress and then commit, but that got the same errors.
> >> Based on this thread,
> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> >> the in place compress and it succeeds without deadlock.  However, it is
> >> significantly slower, too slow to be a workable solution for my use case
> >> and it also doesn't return as much space to the operating system.
> >>
> >> Does this look like a Derby bug where the compress command will
> deadlock if
> >> there are inserts happening at roughly the same time?  Or is this
> >> expected?  Do both the inserts and compress need a sysconglomerates row
> >> lock?  Is there a way the compress command could get both its locks at
> the
> >> same time to avoid the deadlock?
> >>
> >>
> >>
> >>
>
>

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Rick Hillegas <ri...@gmail.com>.
I don't have any theories yet about what is going on. Can you try 
issuing the following statement just before you compress the table:

SELECT * FROM SYSCS_DIAG.LOCK_TABLE;

I don't understand why the INSERT transaction grabs a lock on a 
SYSCONGLOMERATES row.

Thanks,
-Rick


On 9/29/22 1:24 PM, Nathan Jensen wrote:
> Thanks for the response!  To clarify a few things, we're using Derby
> 10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
> being so I can't upgrade to Derby 10.16 at this time.  The version of Qpid
> we're using at present is 7.1.12.
>
> I don't see anything suspicious regarding the INSERT statement, but maybe
> you could see something that I don't.  I believe the relevant code is here:
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
> That does not commit the transaction, the transaction is committed
> elsewhere in the same file, here:
> https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
> I should also note that one time I saw the deadlock happen with a DELETE
> statement instead of an INSERT, so I don't think the unexpected row locking
> in SYSCONGLOMERATES is limited to INSERT.
>
> Here is the DDL as produced by Derby's dblook:
> -- ----------------------------------------------
> -- DDL Statements for tables
> -- ----------------------------------------------
>
> CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
> "MESSAGE_ID" BIGINT NOT NULL);
>
> CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
> "CONTENT" BLOB(2147483647));
>
> CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
> "GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
> FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
> VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);
>
> CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
> "REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
> "LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET" BLOB(2147483647));
>
> CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
> "META_DATA" BLOB(2147483647));
>
> CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);
>
> CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
> VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
> NOT NULL);
>
> CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
> NULL, "VERSION_TIME" TIMESTAMP);
>
> -- ----------------------------------------------
> -- DDL Statements for keys
> -- ----------------------------------------------
>
> -- PRIMARY/UNIQUE
> ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
> "SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("LINK_KEY");
>
> ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
> "SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("MESSAGE_ID");
>
> ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
> "SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("QUEUE_ID", "MESSAGE_ID");
>
> ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
> "SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
> "GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");
>
> ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
> "SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
> "GLOBAL_ID", "BRANCH_ID");
>
> ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
> "SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("MESSAGE_ID");
>
> ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
> "SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
> ("VERSION");
>
>
> And just in case I'm doing something stupid, here is the current version of
> my code.  conn.getAutoCommit() is false, but I also tried it with an
> autocommit connection and without the commit statements that time.  The
> code is inside a try-catch-finally that is inside a TimerTask.
>
> conn = newConnection();
> LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
> LOGGER.info("Compressing qpid_queue_entries");
> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> 'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
> conn.commit();
> LOGGER.info("Compressing qpid_message_metadata");
> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> 'QPID_MESSAGE_METADATA', 0)").executeUpdate();
> conn.commit();
> LOGGER.info("Compressing qpid_message_content");
> conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
> 'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
> conn.commit();
>
> Thanks for any help you can provide!
>
> On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <ri...@gmail.com>
> wrote:
>
>> I'm not an expert on the table compression code, but it does need an
>> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
>>
>> If I'm interpreting the error correctly, the table compressor is blocked
>> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
>> exclusive. But it can't do this because the INSERT transaction holds a
>> shared lock on the SYSCONGLOMERATES row. I don't understand why the INSERT
>> transaction would need to lock a SYSCONGLOMERATES row. The following simple
>> experiment shows that, by itself, a simple INSERT should not lock any rows
>> in SYSCONGLOMERATES:
>>
>> ij version 10.17
>>
>> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>>
>> ij> AUTOCOMMIT OFF;
>>
>> ij> CREATE TABLE t(a INT);
>>
>> 0 rows inserted/updated/deleted
>>
>> ij> CREATE INDEX t_idx ON t(a);
>>
>> 0 rows inserted/updated/deleted
>>
>> ij> CREATE TABLE s(a INT);
>>
>> 0 rows inserted/updated/deleted
>>
>> ij> INSERT INTO s VALUES (1);
>>
>> 1 row inserted/updated/deleted
>>
>> ij> COMMIT;
>>
>> ij> INSERT INTO t SELECT * FROM s;
>>
>> 1 row inserted/updated/deleted
>>
>> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>>
>> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 186            |TABLE|IX  |T                                                                                                                               |Tablelock           |GRANT|T        |2    |NULL
>>
>> 186            |ROW  |X   |T                                                                                                                               |(1,7)               |GRANT|T        |1    |NULL
>>
>> 2 rows selected
>>
>> ij> COMMIT;
>>
>> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>>
>> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 0 rows selected
>>
>>
>> Is there anything else going on in the INSERT transaction? Can you share
>> the table DDL with us, including indexes, constraints, auto-generated
>> columns, etc.?
>>
>> -Rick
>>
>>
>> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>>
>> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
>> have a continuous data flow into Qpid so each message gets persisted to the
>> Derby message store as data comes in and then it is deleted from the Derby
>> message store as data is picked up and processed.  The data flow is pretty
>> constant, there are peaks and valleys but basically data is always coming
>> in so there are LOTS of inserts and deletes.  Over time we found that the
>> Derby message store fills up the disk drive until it runs out of space and
>> in general the workaround to that has been to stop Qpid and remove the
>> message store (i.e. the Derby directory), which loses data.
>>
>> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
>> DerbyRepackingMessageStore that executes the
>> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
>> test system with flowing data, this always has a deadlock exception and
>> some data (the aborted transaction) is lost.  An example is:
>>
>> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
>> could not be obtained due to a deadlock, cycle of locks and waiters is:
>> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>>    Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
>> message_id , meta_data ) values (?, ?)
>>    Granted XID : {501214829, X}
>> Lock : ROW, SYSCONGLOMERATES, (7,8)
>>    Waiting XID : {501214829, X} , APP, alter table
>> "APP"."QPID_MESSAGE_METADATA" compress
>>    Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
>> . The selected victim is XID : 501214885.
>>          at
>> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>>          at
>> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>>          at
>> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>>          at
>> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>>
>> My understanding of that is that the deadlock is occurring because the
>> insert has a shared lock on the sysconglomerates row and wants an exclusive
>> lock on the qpid_message_metadata table, while the compress has an
>> exclusive lock on the qpid_message_metadata table and wants an exclusive
>> lock on the sysconglomerates row, so they block each other.
>>
>> I found this thread from May of 2009 which had a similar issue:https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
>> expect the compress command to go for the lock(s), wait until it gets the
>> lock(s), then block the other threads and do the compress.  But it
>> deadlocks instead.
>>
>> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
>> tried using a non-autocommit connection and locking the table exclusively
>> before calling compress and then commit, but that got the same errors.
>> Based on this thread,https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
>> the in place compress and it succeeds without deadlock.  However, it is
>> significantly slower, too slow to be a workable solution for my use case
>> and it also doesn't return as much space to the operating system.
>>
>> Does this look like a Derby bug where the compress command will deadlock if
>> there are inserts happening at roughly the same time?  Or is this
>> expected?  Do both the inserts and compress need a sysconglomerates row
>> lock?  Is there a way the compress command could get both its locks at the
>> same time to avoid the deadlock?
>>
>>
>>
>>


Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Nathan Jensen <nd...@gmail.com>.
Thanks for the response!  To clarify a few things, we're using Derby
10.15.2.0 and Java 11.  Our project is sticking with Java 11 for the time
being so I can't upgrade to Derby 10.16 at this time.  The version of Qpid
we're using at present is 7.1.12.

I don't see anything suspicious regarding the INSERT statement, but maybe
you could see something that I don't.  I believe the relevant code is here:
https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L675
That does not commit the transaction, the transaction is committed
elsewhere in the same file, here:
https://github.com/apache/qpid-broker-j/blob/7.1.12/broker-plugins/jdbc-store/src/main/java/org/apache/qpid/server/store/jdbc/AbstractJDBCMessageStore.java#L1264
I should also note that one time I saw the deadlock happen with a DELETE
statement instead of an INSERT, so I don't think the unexpected row locking
in SYSCONGLOMERATES is limited to INSERT.

Here is the DDL as produced by Derby's dblook:
-- ----------------------------------------------
-- DDL Statements for tables
-- ----------------------------------------------

CREATE TABLE "APP"."QPID_QUEUE_ENTRIES" ("QUEUE_ID" VARCHAR(36) NOT NULL,
"MESSAGE_ID" BIGINT NOT NULL);

CREATE TABLE "APP"."QPID_MESSAGE_CONTENT" ("MESSAGE_ID" BIGINT NOT NULL,
"CONTENT" BLOB(2147483647));

CREATE TABLE "APP"."QPID_XID_ACTIONS" ("FORMAT" BIGINT NOT NULL,
"GLOBAL_ID" VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64)
FOR BIT DATA NOT NULL, "ACTION_TYPE" CHAR(1) NOT NULL, "QUEUE_ID"
VARCHAR(36) NOT NULL, "MESSAGE_ID" BIGINT NOT NULL);

CREATE TABLE "APP"."AMQP_1_0_LINKS" ("LINK_KEY" VARCHAR(44) NOT NULL,
"REMOTE_CONTAINER_ID" BLOB(2147483647), "LINK_NAME" BLOB(2147483647),
"LINK_ROLE" INTEGER, "SOURCE" BLOB(2147483647), "TARGET" BLOB(2147483647));

CREATE TABLE "APP"."QPID_MESSAGE_METADATA" ("MESSAGE_ID" BIGINT NOT NULL,
"META_DATA" BLOB(2147483647));

CREATE TABLE "APP"."QPID_DB_VERSION" ("VERSION" INTEGER NOT NULL);

CREATE TABLE "APP"."QPID_XIDS" ("FORMAT" BIGINT NOT NULL, "GLOBAL_ID"
VARCHAR (64) FOR BIT DATA NOT NULL, "BRANCH_ID" VARCHAR (64) FOR BIT DATA
NOT NULL);

CREATE TABLE "APP"."AMQP_1_0_LINKS_VERSION" ("VERSION" VARCHAR(10) NOT
NULL, "VERSION_TIME" TIMESTAMP);

-- ----------------------------------------------
-- DDL Statements for keys
-- ----------------------------------------------

-- PRIMARY/UNIQUE
ALTER TABLE "APP"."AMQP_1_0_LINKS" ADD CONSTRAINT
"SQL0000000086-b2280141-0183-5b33-aec6-000016c47528" PRIMARY KEY
("LINK_KEY");

ALTER TABLE "APP"."QPID_MESSAGE_METADATA" ADD CONSTRAINT
"SQL0000000082-e333810d-0183-5b33-aec6-000016c47528" PRIMARY KEY
("MESSAGE_ID");

ALTER TABLE "APP"."QPID_QUEUE_ENTRIES" ADD CONSTRAINT
"SQL0000000081-61228105-0183-5b33-aec6-000016c47528" PRIMARY KEY
("QUEUE_ID", "MESSAGE_ID");

ALTER TABLE "APP"."QPID_XID_ACTIONS" ADD CONSTRAINT
"SQL0000000085-69c68125-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
"GLOBAL_ID", "BRANCH_ID", "ACTION_TYPE", "QUEUE_ID", "MESSAGE_ID");

ALTER TABLE "APP"."QPID_XIDS" ADD CONSTRAINT
"SQL0000000084-e785811d-0183-5b33-aec6-000016c47528" PRIMARY KEY ("FORMAT",
"GLOBAL_ID", "BRANCH_ID");

ALTER TABLE "APP"."QPID_MESSAGE_CONTENT" ADD CONSTRAINT
"SQL0000000083-65548115-0183-5b33-aec6-000016c47528" PRIMARY KEY
("MESSAGE_ID");

ALTER TABLE "APP"."AMQP_1_0_LINKS_VERSION" ADD CONSTRAINT
"SQL0000000087-5555c14b-0183-5b33-aec6-000016c47528" PRIMARY KEY
("VERSION");


And just in case I'm doing something stupid, here is the current version of
my code.  conn.getAutoCommit() is false, but I also tried it with an
autocommit connection and without the commit statements that time.  The
code is inside a try-catch-finally that is inside a TimerTask.

conn = newConnection();
LOGGER.info("connection.getAutoCommit() is " + conn.getAutoCommit());
LOGGER.info("Compressing qpid_queue_entries");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_QUEUE_ENTRIES', 0)").executeUpdate();
conn.commit();
LOGGER.info("Compressing qpid_message_metadata");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_MESSAGE_METADATA', 0)").executeUpdate();
conn.commit();
LOGGER.info("Compressing qpid_message_content");
conn.prepareStatement("call SYSCS_UTIL.SYSCS_COMPRESS_TABLE('APP',
'QPID_MESSAGE_CONTENT', 0)").executeUpdate();
conn.commit();

Thanks for any help you can provide!

On Thu, Sep 29, 2022 at 1:28 PM Rick Hillegas <ri...@gmail.com>
wrote:

> I'm not an expert on the table compression code, but it does need an
> overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).
>
> If I'm interpreting the error correctly, the table compressor is blocked
> trying to escalate its lock on the SYSCONGLOMERATES row from shared to
> exclusive. But it can't do this because the INSERT transaction holds a
> shared lock on the SYSCONGLOMERATES row. I don't understand why the INSERT
> transaction would need to lock a SYSCONGLOMERATES row. The following simple
> experiment shows that, by itself, a simple INSERT should not lock any rows
> in SYSCONGLOMERATES:
>
> ij version 10.17
>
> ij> CONNECT 'jdbc:derby:memory:db;create=true';
>
> ij> AUTOCOMMIT OFF;
>
> ij> CREATE TABLE t(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE INDEX t_idx ON t(a);
>
> 0 rows inserted/updated/deleted
>
> ij> CREATE TABLE s(a INT);
>
> 0 rows inserted/updated/deleted
>
> ij> INSERT INTO s VALUES (1);
>
> 1 row inserted/updated/deleted
>
> ij> COMMIT;
>
> ij> INSERT INTO t SELECT * FROM s;
>
> 1 row inserted/updated/deleted
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 186            |TABLE|IX  |T                                                                                                                               |Tablelock           |GRANT|T        |2    |NULL
>
> 186            |ROW  |X   |T                                                                                                                               |(1,7)               |GRANT|T        |1    |NULL
>
> 2 rows selected
>
> ij> COMMIT;
>
> ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;
>
> XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 0 rows selected
>
>
> Is there anything else going on in the INSERT transaction? Can you share
> the table DDL with us, including indexes, constraints, auto-generated
> columns, etc.?
>
> -Rick
>
>
> On 9/28/22 2:12 PM, Nathan Jensen wrote:
>
> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
> have a continuous data flow into Qpid so each message gets persisted to the
> Derby message store as data comes in and then it is deleted from the Derby
> message store as data is picked up and processed.  The data flow is pretty
> constant, there are peaks and valleys but basically data is always coming
> in so there are LOTS of inserts and deletes.  Over time we found that the
> Derby message store fills up the disk drive until it runs out of space and
> in general the workaround to that has been to stop Qpid and remove the
> message store (i.e. the Derby directory), which loses data.
>
> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
> DerbyRepackingMessageStore that executes the
> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
> test system with flowing data, this always has a deadlock exception and
> some data (the aborted transaction) is lost.  An example is:
>
> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>   Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>   Granted XID : {501214829, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>   Waiting XID : {501214829, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>   Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> . The selected victim is XID : 501214885.
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>         at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>         at
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>         at
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>
> My understanding of that is that the deadlock is occurring because the
> insert has a shared lock on the sysconglomerates row and wants an exclusive
> lock on the qpid_message_metadata table, while the compress has an
> exclusive lock on the qpid_message_metadata table and wants an exclusive
> lock on the sysconglomerates row, so they block each other.
>
> I found this thread from May of 2009 which had a similar issue:https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f  I would
> expect the compress command to go for the lock(s), wait until it gets the
> lock(s), then block the other threads and do the compress.  But it
> deadlocks instead.
>
> Based on this ticket, https://issues.apache.org/jira/browse/DERBY-3683, I
> tried using a non-autocommit connection and locking the table exclusively
> before calling compress and then commit, but that got the same errors.
> Based on this thread,https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> the in place compress and it succeeds without deadlock.  However, it is
> significantly slower, too slow to be a workable solution for my use case
> and it also doesn't return as much space to the operating system.
>
> Does this look like a Derby bug where the compress command will deadlock if
> there are inserts happening at roughly the same time?  Or is this
> expected?  Do both the inserts and compress need a sysconglomerates row
> lock?  Is there a way the compress command could get both its locks at the
> same time to avoid the deadlock?
>
>
>
>

Re: deadlock when running SYSCS_UTIL.SYSCS_COMPRESS_TABLE

Posted by Rick Hillegas <ri...@gmail.com>.
I'm not an expert on the table compression code, but it does need an 
overhaul (see https://issues.apache.org/jira/browse/DERBY-3683).

If I'm interpreting the error correctly, the table compressor is blocked 
trying to escalate its lock on the SYSCONGLOMERATES row from shared to 
exclusive. But it can't do this because the INSERT transaction holds a 
shared lock on the SYSCONGLOMERATES row. I don't understand why the 
INSERT transaction would need to lock a SYSCONGLOMERATES row. The 
following simple experiment shows that, by itself, a simple INSERT 
should not lock any rows in SYSCONGLOMERATES:

ij version 10.17

ij> CONNECT 'jdbc:derby:memory:db;create=true';

ij> AUTOCOMMIT OFF;

ij> CREATE TABLE t(a INT);

0 rows inserted/updated/deleted

ij> CREATE INDEX t_idx ON t(a);

0 rows inserted/updated/deleted

ij> CREATE TABLE s(a INT);

0 rows inserted/updated/deleted

ij> INSERT INTO s VALUES (1);

1 row inserted/updated/deleted

ij> COMMIT;

ij> INSERT INTO t SELECT * FROM s;

1 row inserted/updated/deleted

ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;

XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

186            |TABLE|IX  |T                                                                                                                               |Tablelock           |GRANT|T        |2    |NULL

186            |ROW  |X   |T                                                                                                                               |(1,7)               |GRANT|T        |1    |NULL

2 rows selected

ij> COMMIT;

ij> SELECT * FROM SYSCS_DIAG.LOCK_TABLE;

XID            |TYPE |MODE|TABLENAME                                                                                                                       |LOCKNAME            |STATE|TABLETYPE|LOCK&|INDEXNAME

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

0 rows selected


Is there anything else going on in the INSERT transaction? Can you share 
the table DDL with us, including indexes, constraints, auto-generated 
columns, etc.?

-Rick


On 9/28/22 2:12 PM, Nathan Jensen wrote:
> Hi, we are using Derby as the backing message store for Qpid-Broker-J.  We
> have a continuous data flow into Qpid so each message gets persisted to the
> Derby message store as data comes in and then it is deleted from the Derby
> message store as data is picked up and processed.  The data flow is pretty
> constant, there are peaks and valleys but basically data is always coming
> in so there are LOTS of inserts and deletes.  Over time we found that the
> Derby message store fills up the disk drive until it runs out of space and
> in general the workaround to that has been to stop Qpid and remove the
> message store (i.e. the Derby directory), which loses data.
>
> My attempted solution to this was to extend Qpid's DerbyMessageStore with a
> DerbyRepackingMessageStore that executes the
> SYSCS_UTIL.SYSCS_COMPRESS_TABLE procedure based on a timer.  However, on a
> test system with flowing data, this always has a deadlock exception and
> some data (the aborted transaction) is lost.  An example is:
>
> Caused by: org.apache.derby.shared.common.error.StandardException: A lock
> could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : TABLE, QPID_MESSAGE_METADATA, Tablelock
>    Waiting XID : {501214885, IX} , APP, INSERT INTO QPID_MESSAGE_METADATA(
> message_id , meta_data ) values (?, ?)
>    Granted XID : {501214829, X}
> Lock : ROW, SYSCONGLOMERATES, (7,8)
>    Waiting XID : {501214829, X} , APP, alter table
> "APP"."QPID_MESSAGE_METADATA" compress
>    Granted XID : {501214829, S} , {501214865, S} , {501214885, S}
> . The selected victim is XID : 501214885.
>          at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:300)
>          at
> org.apache.derby.shared.common.error.StandardException.newException(StandardException.java:295)
>          at
> org.apache.derby.impl.services.locks.Deadlock.buildException(Deadlock.java:557)
>          at
> org.apache.derby.impl.services.locks.ConcurrentLockSet.lockObject(ConcurrentLockSet.java:647)
>
> My understanding of that is that the deadlock is occurring because the
> insert has a shared lock on the sysconglomerates row and wants an exclusive
> lock on the qpid_message_metadata table, while the compress has an
> exclusive lock on the qpid_message_metadata table and wants an exclusive
> lock on the sysconglomerates row, so they block each other.
>
> I found this thread from May of 2009 which had a similar issue:
> https://lists.apache.org/thread/b8p7kzcc0clo45o2drs5rttsx5gqq49f   I would
> expect the compress command to go for the lock(s), wait until it gets the
> lock(s), then block the other threads and do the compress.  But it
> deadlocks instead.
>
> Based on this ticket,https://issues.apache.org/jira/browse/DERBY-3683, I
> tried using a non-autocommit connection and locking the table exclusively
> before calling compress and then commit, but that got the same errors.
> Based on this thread,
> https://lists.apache.org/thread/38qg7nkqolssc1jhdkhgnrl7vbkwq0vw, I tried
> the in place compress and it succeeds without deadlock.  However, it is
> significantly slower, too slow to be a workable solution for my use case
> and it also doesn't return as much space to the operating system.
>
> Does this look like a Derby bug where the compress command will deadlock if
> there are inserts happening at roughly the same time?  Or is this
> expected?  Do both the inserts and compress need a sysconglomerates row
> lock?  Is there a way the compress command could get both its locks at the
> same time to avoid the deadlock?
>