You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Justin Bertram (Jira)" <ji...@apache.org> on 2020/06/17 13:51:00 UTC

[jira] [Updated] (ARTEMIS-2807) Avoid notifications on critical IO error

     [ https://issues.apache.org/jira/browse/ARTEMIS-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Bertram updated ARTEMIS-2807:
------------------------------------
    Description: 
When the broker hits a "critical" IO error it will shut itself down. However, during the shutdown process multiple notifications are sent. These notifications trigger disk IO which can delay (and potentially hang) shutdown. Here's an example from a thread-dump from a broker hung in shutdown:
  
{noformat}
"Thread-11" #73 prio=5 os_prio=0 tid=0x00007fa3fc002800 nid=0x1907 waiting on condition [0x00007fa48d60d000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000009b1055f0> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
	at org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:61)
	at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendAddRecord(JournalBase.java:52)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendAddRecord(JournalImpl.java:93)
	at org.apache.activemq.artemis.core.journal.Journal.appendAddRecord(Journal.java:65)
	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.storeID(AbstractJournalStorageManager.java:805)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.storeID(BatchingIDGenerator.java:147)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.saveCheckPoint(BatchingIDGenerator.java:132)
	- locked <0x0000000090f23850> (a org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.generateID(BatchingIDGenerator.java:111)
	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.generateID(AbstractJournalStorageManager.java:334)
	at org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:678)
	- locked <0x0000000090b0ab20> (a java.lang.Object)
	- locked <0x0000000090f21550> (a org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl)
	at org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl.stop(BroadcastGroupImpl.java:142)
	- locked <0x0000000090b0b650> (a org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl)
	at org.apache.activemq.artemis.core.server.cluster.ClusterManager.stop(ClusterManager.java:310)
	- locked <0x0000000090b0b508> (a org.apache.activemq.artemis.core.server.cluster.ClusterManager)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1355)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1090)
	- locked <0x0000000090f1d128> (a org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1054)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5.run(ActiveMQServerImpl.java:860)
{noformat}

  was:
When the broker hits a "critical" IO error it will shut itself down. However, during the shutdown process multiple notifications are sent. These notifications trigger disk IO which can delay shutdown. Here's an example from a thread-dump from a broker hung in shutdown:
 
{noformat}
"Thread-11" #73 prio=5 os_prio=0 tid=0x00007fa3fc002800 nid=0x1907 waiting on condition [0x00007fa48d60d000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000009b1055f0> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
	at org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:61)
	at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendAddRecord(JournalBase.java:52)
	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendAddRecord(JournalImpl.java:93)
	at org.apache.activemq.artemis.core.journal.Journal.appendAddRecord(Journal.java:65)
	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.storeID(AbstractJournalStorageManager.java:805)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.storeID(BatchingIDGenerator.java:147)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.saveCheckPoint(BatchingIDGenerator.java:132)
	- locked <0x0000000090f23850> (a org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator)
	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.generateID(BatchingIDGenerator.java:111)
	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.generateID(AbstractJournalStorageManager.java:334)
	at org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:678)
	- locked <0x0000000090b0ab20> (a java.lang.Object)
	- locked <0x0000000090f21550> (a org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl)
	at org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl.stop(BroadcastGroupImpl.java:142)
	- locked <0x0000000090b0b650> (a org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl)
	at org.apache.activemq.artemis.core.server.cluster.ClusterManager.stop(ClusterManager.java:310)
	- locked <0x0000000090b0b508> (a org.apache.activemq.artemis.core.server.cluster.ClusterManager)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1355)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1090)
	- locked <0x0000000090f1d128> (a org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1054)
	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5.run(ActiveMQServerImpl.java:860)
{noformat}


> Avoid notifications on critical IO error
> ----------------------------------------
>
>                 Key: ARTEMIS-2807
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2807
>             Project: ActiveMQ Artemis
>          Issue Type: Improvement
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the broker hits a "critical" IO error it will shut itself down. However, during the shutdown process multiple notifications are sent. These notifications trigger disk IO which can delay (and potentially hang) shutdown. Here's an example from a thread-dump from a broker hung in shutdown:
>   
> {noformat}
> "Thread-11" #73 prio=5 os_prio=0 tid=0x00007fa3fc002800 nid=0x1907 waiting on condition [0x00007fa48d60d000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x000000009b1055f0> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> 	at org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:61)
> 	at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendAddRecord(JournalBase.java:52)
> 	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendAddRecord(JournalImpl.java:93)
> 	at org.apache.activemq.artemis.core.journal.Journal.appendAddRecord(Journal.java:65)
> 	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.storeID(AbstractJournalStorageManager.java:805)
> 	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.storeID(BatchingIDGenerator.java:147)
> 	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.saveCheckPoint(BatchingIDGenerator.java:132)
> 	- locked <0x0000000090f23850> (a org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator)
> 	at org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.generateID(BatchingIDGenerator.java:111)
> 	at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.generateID(AbstractJournalStorageManager.java:334)
> 	at org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:678)
> 	- locked <0x0000000090b0ab20> (a java.lang.Object)
> 	- locked <0x0000000090f21550> (a org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl)
> 	at org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl.stop(BroadcastGroupImpl.java:142)
> 	- locked <0x0000000090b0b650> (a org.apache.activemq.artemis.core.server.cluster.impl.BroadcastGroupImpl)
> 	at org.apache.activemq.artemis.core.server.cluster.ClusterManager.stop(ClusterManager.java:310)
> 	- locked <0x0000000090b0b508> (a org.apache.activemq.artemis.core.server.cluster.ClusterManager)
> 	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1355)
> 	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1090)
> 	- locked <0x0000000090f1d128> (a org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
> 	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1054)
> 	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5.run(ActiveMQServerImpl.java:860)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)