You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Justin Bertram (Jira)" <ji...@apache.org> on 2022/11/10 19:52:00 UTC

[jira] [Commented] (ARTEMIS-3992) Store corruption and broker instabillty with rollback of XA transactions

    [ https://issues.apache.org/jira/browse/ARTEMIS-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631896#comment-17631896 ] 

Justin Bertram commented on ARTEMIS-3992:
-----------------------------------------

I'm not sure where/when the issue was fixed as I'm not sure what the root cause actually was.

Aside from that, where are we on this issue? Given the lack of activity I assume everything is working as expected now. Can you confirm?

> Store corruption and broker instabillty with rollback of XA transactions
> ------------------------------------------------------------------------
>
>                 Key: ARTEMIS-3992
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3992
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.16.0
>            Reporter: SL
>            Priority: Major
>
> Edit : i had bad information about the time of the upgrade to 2.24.0, it was repeated just before upgrade, final status of the issue pending.
> We are experiancing a major stability issue with artemis which seems triggered by expired XA transactions.
> It starts with a bunch of timeouts like
> {noformat}
> 2022-09-13 00:00:02,970 WARN  [org.apache.activemq.artemis.core.server] AMQ222103: transaction with xid XidImpl (2133539424 (...) timed out{noformat}
> Then a lot of recurring exceptions on the persistent store
> {noformat}
> MQ222055: Error on deleting duplicate cache: java.lang.IllegalStateException: Cannot find add info 228196096 on compactor or current records
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1152) [artemis-journal-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:989) [artemis-journal-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.deleteDuplicateID(AbstractJournalStorageManager.java:482) [artemis-server-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.addToCacheInMemory(DuplicateIDCacheImpl.java:265) [artemis-server-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.access$000(DuplicateIDCacheImpl.java:41) [artemis-server-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.process(DuplicateIDCacheImpl.java:347) [artemis-server-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.beforeCommit(DuplicateIDCacheImpl.java:363) [artemis-server-2.16.0.jar:2.16.0]
>         at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.beforeCommit(TransactionImpl.java:599) [artemis-server-2.16.0
> {noformat}
> From client side the consuming seems to slow down and at some point stops completely.
> The broker can partialy recover with a restart but its seems be still have issues if not given a new clean and empty persistant store.
> (Note : it might be similar to ARTEMIS-2373)
> Background :
> - It's a standalone artemis instance serving as front for other brokers (connected by bridges, working fine). It forwards messages submitted by clients to brokers connected to applications services and get back response messages which are consumed by the clients (basically a kind of reverse proxy).
> - It has been recently upgraded to 2.24.0 hoping that would fix the issue, but it remains identical.
> - It's a production system, the issue have not yet been reproduced on test environments (but it is repeated several times on this production environment)
> - We do not own the client trying to consume the messages and have little information on the specifics of its internals and XA usage.
> - Clients not using XA did not exhibit this kind of issue using the services for months, even years.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)