You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Ben D'Herville (JIRA)" <ji...@apache.org> on 2018/11/21 22:48:00 UTC

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand

     [ https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ben D'Herville updated AMQ-7108:
--------------------------------
    Description: 
*Issue*

We have been running the stack with the same stated versions mentioned in this ticket's environment details for over a year.  Recently we have experienced ActiveMQ outages as a result of the exception below.  There has been a increase in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  A workflow will execute a step then schedule a small object message, containing just an identifier, to check for a result in a number of seconds.  The consumer of the message will schedule another message if the task is not complete yet.  We also have a batch process that can start a number of workflows concurrently.  Usually in the order of 100's which means 100's of scheduled messages can be created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the broker.  kahadb is configured to check for corrupt journal which would recover the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven plugin but was unable to.  We had 10000 producers concurrently schedule 500 messages each.  

 

{{Note there are two stack traces here.  The second one is possibly a side-effect of the first one.}}
{{ 2018-Nov-21 15:16:48 JobScheduler:JMS [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed to schedule job}}
{{ java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
{{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:java.lang.NullPointerException}}
{{ at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
{{ at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
{{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
{{ at org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
{{ at org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
{{ at org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
{{ at org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
{{ at org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
{{ at org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
{{ at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
{{ at org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
{{ at org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
{{ at org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}

  was:
*Issue*

We have been running the stack with the same stated versions mentioned in this ticket's environment details for over a year.  Recently we have experienced ActiveMQ outages as a result of the exception below.  There has been a increase in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  A workflow will execute a step then schedule a small object message, containing just an identifier, to check for a result in a number of seconds.  The consumer of the message will schedule another message if the task is not complete yet.  We also have a batch process that can start a number of workflows concurrently.  Usually in the order of 100's which means 100's of scheduled messages can be created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the broker.  kahadb is configured to check for corrupt journal which would recover the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven plugin but was unable to.  We had 10000 producers concurrently schedule 500 messages each.  

 

Note there are two stack traces here.  The second one is possibly a side-effect of the first one.
2018-Nov-21 15:16:48 JobScheduler:JMS [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed to schedule job
java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
        at java.lang.Thread.run(Thread.java:748)

2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:java.lang.NullPointerException
        at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
        at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
        at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
        at org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
        at org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
        at org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
        at org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
        at org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
        at org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
        at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
        at org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
        at org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
        at org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)


> Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-7108
>                 URL: https://issues.apache.org/jira/browse/AMQ-7108
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Job Scheduler, KahaDB
>    Affects Versions: 5.11.1
>         Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>  
>            Reporter: Ben D'Herville
>            Priority: Major
>         Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in this ticket's environment details for over a year.  Recently we have experienced ActiveMQ outages as a result of the exception below.  There has been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow system.  A workflow will execute a step then schedule a small object message, containing just an identifier, to check for a result in a number of seconds.  The consumer of the message will schedule another message if the task is not complete yet.  We also have a batch process that can start a number of workflows concurrently.  Usually in the order of 100's which means 100's of scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the broker.  kahadb is configured to check for corrupt journal which would recover the journal on startup.  It is unclear if the journal is corrupted though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven plugin but was unable to.  We had 10000 producers concurrently schedule 500 messages each.  
>  
> {{Note there are two stack traces here.  The second one is possibly a side-effect of the first one.}}
> {{ 2018-Nov-21 15:16:48 JobScheduler:JMS [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed to schedule job}}
> {{ java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:java.lang.NullPointerException}}
> {{ at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
> {{ at org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
> {{ at org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
> {{ at org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
> {{ at org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
> {{ at org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
> {{ at org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
> {{ at org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
> {{ at org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
> {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
> {{ at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
> {{ at org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
> {{ at org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
> {{ at org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)