You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Semen Boikov (JIRA)" <ji...@apache.org> on 2015/11/03 10:00:35 UTC

[jira] [Created] (IGNITE-1843) Hang in GridJobProcessor on node stop

Semen Boikov created IGNITE-1843:
------------------------------------

             Summary: Hang in GridJobProcessor on node stop
                 Key: IGNITE-1843
                 URL: https://issues.apache.org/jira/browse/IGNITE-1843
             Project: Ignite
          Issue Type: Bug
          Components: compute
            Reporter: Semen Boikov
            Assignee: Semen Boikov
            Priority: Blocker
             Fix For: 1.5


Observed hang in GridTaskFailoverAffinityRunTest.testNodeRestart:

GridJobProcessor in onKernalStop tries to block operations and gets write lock:
{noformat}
[11:42:06] :		 [org.apache.ignite:ignite-core] Thread [name="restart-thread-2", id=27562, state=TIMED_WAITING, blockCnt=885, waitCnt=27131]
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.lang.Thread.sleep(Native Method)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.util.GridSpinReadWriteLock.writeLock(GridSpinReadWriteLock.java:210)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.job.GridJobProcessor.onKernalStop(GridJobProcessor.java:277)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:1824)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.IgniteKernal.stop(IgniteKernal.java:1770)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2133)
[11:42:06] :		 [org.apache.ignite:ignite-core]         - locked o.a.i.i.IgnitionEx$IgniteNamedInstance@6602227a
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2096)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx.stop(IgnitionEx.java:314)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.Ignition.stop(Ignition.java:223)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:802)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1060)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.GridTaskFailoverAffinityRunTest.access$000(GridTaskFailoverAffinityRunTest.java:44)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.GridTaskFailoverAffinityRunTest$1.call(GridTaskFailoverAffinityRunTest.java:121)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.testframework.GridTestThread.run(GridTestThread.java:86)
{noformat}

Discovery listener thread is blocked trying to get read lock:
{noformat}
[11:42:06] :		 [org.apache.ignite:ignite-core] Thread [name="disco-event-worker-#26595%internal.GridTaskFailoverAffinityRunTest2%", id=32093, state=TIMED_WAITING, blockCnt=0, waitCnt=25843]
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.lang.Thread.sleep(Native Method)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.util.GridSpinReadWriteLock.readLock(GridSpinReadWriteLock.java:101)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.job.GridJobProcessor$JobDiscoveryListener.onEvent(GridJobProcessor.java:1854)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:770)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:755)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:295)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:1949)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2156)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:1989)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.lang.Thread.run(Thread.java:745)
{noformat}

There is hanging 'get' from marshaller cache inside GridJobProcessor read lock which probably depends on discovery event:
{noformat}
[11:42:06] :		 [org.apache.ignite:ignite-core] Thread [name="ignite-#26553%pub-internal.GridTaskFailoverAffinityRunTest2%", id=32037, state=WAITING, blockCnt=2, waitCnt=3]
[11:42:06] :		 [org.apache.ignite:ignite-core]     Lock [object=o.a.i.i.processors.cache.distributed.dht.GridPartitionedGetFuture@30ae29ee, ownerName=null, ownerId=-1]
[11:42:06] :		 [org.apache.ignite:ignite-core]         at sun.misc.Unsafe.park(Native Method)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.GridCacheAdapter.getTopologySafe(GridCacheAdapter.java:1312)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.MarshallerContextImpl.className(MarshallerContextImpl.java:151)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.MarshallerContextAdapter.getClass(MarshallerContextAdapter.java:174)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(OptimizedMarshallerUtils.java:257)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:309)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.closure.GridClosureProcessor$C2.readExternal(GridClosureProcessor.java:1808)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedObjectInputStream.readExternalizable(OptimizedObjectInputStream.java:514)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:803)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:315)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.marshaller.optimized.OptimizedMarshaller.unmarshal(OptimizedMarshaller.java:248)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.job.GridJobWorker.initialize(GridJobWorker.java:409)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1094)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1776)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:811)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.communication.GridIoManager.access$1500(GridIoManager.java:106)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at o.a.i.i.managers.communication.GridIoManager$5.run(GridIoManager.java:774)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[11:42:06] :		 [org.apache.ignite:ignite-core]         at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)